Behavior Research Methods 2009, 41 (4), 991-1008 doi:10.3758/BRM.41.4.991

A comparative quantitative analysis of Greek orthographic transparency

ATHANASSIOS PROTOPAPAS Institute for Language and Speech Processing, Maroussi, AND

ELENI L. VLAHOU Institute for Language and Speech Processing, Maroussi, Greece and University of Crete, Rethymno, Greece

Orthographic transparency refers to the systematicity in the mapping between orthographic letter sequences and phonological sequences in both directions, for reading and . Measures of transparency previously used in the analysis of of other languages include regularity, consistency, and entropy. However, previous reports have typically been hampered by severe restrictions, such as using only monosyl- lables or only word-initial . Greek is sufficiently transparent to allow complete sequential alignment between graphemes and phonemes, therefore permitting full analyses at both letter and grapheme levels, using every word in its entirety. Here, we report multiple alternative measures of transparency, using both type and token counts, and compare these with estimates for other languages. We discuss the problems stemming from restricted analysis sets and the implications for psycholinguistic experimentation and computational modeling of reading and spelling.

Alphabetic orthographies differ in their degree of also affects reading. That is, words with predictable pro- transparency—that is, in the systematicity of the map- nunciation but unpredictable spelling are read and rec- ping between letter sequences and phoneme sequences. ognized more slowly than words with predictable spell- Inconsistencies in the sound–spelling mappings arise ing (Grainger & Ziegler, 2008; McKague, Davis, Pratt, when single orthographic units have multiple pronuncia- & Johnston, 2008). However, in a review of studies on tions or single phonological units have multiple . feedback inconsistency effects, Kessler, Treiman, and Quantitative assessments of such ambiguities have been Mullennix (2008) pointed out a number of methodologi- carried out in several languages with alphabetic ortho- cal shortcomings that need to be addressed before a final graphic systems, both in the feedforward direction—that conclusion can be reached. To pursue these issues in addi- is, from to phonology, as needed for reading tional languages, detailed quantification of orthographic aloud printed words—and in the feedback direction, from consistency in both directions is necessary. phonology to orthography, as needed for spelling (e.g., Ambiguity does not affect all sublexical units equally. Borgwaldt, Hellwig, & De Groot, 2004, 2005; Treiman, In more opaque orthographies, smaller units tend to be Mullennix, Bijeljac-Babic, & Richmond-Welty, 1995; less consistent than larger units (Ziegler & Goswami, Ziegler, Jacobs, & Stone, 1996; Ziegler, Stone, & Jacobs, 2005). For example, graphemes1 are less consistent than 1997). orthographic bodies (spellings of a syllabic rime—i.e., of The study of orthographic transparency is important the nuclear vowel and any consonants that follow it) in En- for theoretical and practical reasons, since ambiguous glish monosyllables (Treiman et al., 1995). There is thus a mappings have been found to affect reading and spell- functional pressure for readers to develop both small-unit ing performance (Spencer, 2007, in press). For example, and large-unit recoding strategies. As the grain size grows, feedforward-inconsistent orthographic units (i.e., letter the number of distinct orthographic units rises. This gran- sequences that can be pronounced in more than one way) ularity problem is more pervasive in opaque orthogra- slow down word naming (Burani, Barca, & Ellis, 2006; phies, whereas readers of more transparent orthographies Jared, 2002; Treiman et al., 1995), whereas ambiguities in can focus on finer grain sizes (Ziegler & Goswami, 2005). the feedback direction affect spelling performance (Burt This assertion remains to be substantiated with specific & Blackwell, 2008; Lété, Peereman, & Fayol, 2008). A estimates of the transparency and granularity of specific counterintuitive finding is that feedback inconsistency orthographic systems.

A. Protopapas, [email protected]

991 © 2009 The Psychonomic Society, Inc. 992 PROTOPAPAS AND VLAHOU

In the present work, we address two issues. First, we et al., 2001) when presented with monosyllabic words provide a systematic quantitative exposition of transpar- from each language. By definition, the pronunciation ency in the Greek orthography. The goals of this presen- produced by the sublexical route of the DRC model will tation are to support researchers working on Greek with be wrong for every irregular word, since it will tend to information relevant to stimulus selection and experimen- regularize it according to the specified conversion rules tal design and to provide researchers working in other or- of the language. Therefore, an evaluation of the model’s thographies with a comparison reference. Second, we use performance provides an index of the degree of regularity Greek as a test case to examine and evaluate a multitude of the language. Using this rule-based approach, Ziegler of approaches to the quantification of orthographic trans- et al. (2000) found that the percentage of correct rule ap- parency. By comparing and contrasting various alternative plication is 90.4% for German as compared with 79.3% methods that have been proposed in the literature, we are for English monosyllabic words. Because this approach able to test certain assumptions underlying them and in- has not yet been applied to polysyllabic words, it is un- dicate weaknesses that may limit their applicability. Thus, known whether these are valid estimates for more repre- our analysis suggests, directly or indirectly, ways in which sentative samples. the quantification of orthographic transparency may be Consistency. As an alternative to regularity, the con- improved in future research. sistency approach forgoes the notion of rules. Consistency In the following sections, we first will present existing refers to the (lack of) variability in the correspondences approaches to the quantification of orthographic transpar- between the phonological and orthographic units of a lan- ency in different alphabetic writing systems, discussing guage. For example, the consistency of a grapheme (in their methodological strengths and weaknesses. We then the feedforward direction; or a phoneme, in the feedback will introduce the most important aspects of the Greek direction) decreases as the number and relative frequency orthography. We will report analyses of the transparency of its corresponding alternative pronunciations (or spell- of Greek orthography, comparing and contrasting cal- ings, respectively) increases (e.g., Lété et al., 2008; Perry, culations of regularity and consistency, on the basis of a Ziegler, & Coltheart, 2002). Consistency computations word-form list from a representative corpus of contempo- can be performed at the grapheme–phoneme level or at rary written Greek texts. We will identify the grapheme– larger grain sizes and can be dichotomous or graded. In phoneme level of analysis as the most appropriate for dichotomous analyses, a word (or smaller size unit) is con- Greek, and we will present statistics relating individual sidered consistent when there is only one possible map- phonemes and graphemes, as well as an ordered set of ping for it or inconsistent when alternative mappings are rules maximally capturing graphophonemic transcription. possible. In graded analyses, the measure of consistency Finally, we will discuss the implications of our findings quantifies ambiguity by taking into account the relative for cross-linguistic evaluation of orthographic transpar- frequency of alternative mappings and is expressed as the ency, for learning to read and write in Greek, and for mod- proportion of dominant mappings over the total number of eling reading Greek. occurrences of the base unit of analysis. Consistency estimation based on grain sizes larger than Quantitative Indices of the phoneme–grapheme is a common approach for the Orthographic Transparency English language because taking into account larger parts Regularity. The regularity approach assumes the theo- of the syllable reduces ambiguity (Kessler & Treiman, retical position that there are regular mappings, governed 2001; Peereman & Content, n.d.; Treiman et al., 1995; by symbolic transcription rules, and irregular mappings, Ziegler et al., 1997). Treiman et al. performed statisti- which violate the rules. Under this framework, the prob- cal analyses of the spelling-to-sound relations of English lem consists in the specification of a set of rules that relate monosyllabic words with a consonant–vowel–consonant individual graphemes to the corresponding phonemes (for (CVC) structure. They found that the vowel unit is highly the feedforward direction, or the reverse for the feedback inconsistent when examined individually or in combina- direction). In cases in which the mapping deviates from tion with the initial consonant (head). But when the vow- one-to-one—for example, when a single grapheme can els were examined together with the following consonants have multiple pronunciations—the most frequent map- (forming rimes), the appeared to be ping is considered regular, and the others irregular. Regu- much more consistent. Kessler and Treiman extended the lar words are words whose pronunciation or spelling is analysis in both directions (reading and spelling). They correctly produced by the grapheme–phoneme correspon- introduced conditional consistencies and permutation dence rules of a language, whereas irregular or exception tests of significance to address questions such as whether words are words whose pronunciation or spelling cannot the rime is processed as a whole or whether the influ- be predicted from these rules (Coltheart, Rastle, Perry, ence of the intrarime units is symmetrical (i.e., whether Langdon, & Ziegler, 2001; Ziegler, Perry, & Coltheart, the vowel and the coda improve the consistency of each 2003). Regularity is thus conceptualized as a categorical other). They concluded that rimes are not processed as in- distinction (Zevin & Seidenberg, 2006). dividual units. Rather, the basic processing seems to occur Ziegler, Perry, and Coltheart (2000) compared the de- at a phonemic– graphemic level that takes into account the gree of regularity of English and German by examining context in which each phoneme–grapheme is found. the pronunciations produced by the sublexical reading On the basis of the rime–body level and using a di- route of the dual-route cascaded (DRC) model (Coltheart chotomous classification, Ziegler et al. (1996; Ziegler ORTHOGRAPHIC TRANSPARENCY IN GREEK 993 et al., 1997) performed bidirectional statistical analyses tropy is an information-theoretic notion that quantifies of French and English monosyllabic monomorphemic uncertainty (i.e., lack of information; Shannon, 1948, words. A word was considered consistent if there was a 1950). In the context of graphophonemic transparency, one-to-one correspondence between the word’s spelling entropy quantifies ambiguity in the prediction of letters or body and its phonological body. The results showed that graphemes by phonemes and vice versa. If a given graph- the degree of inconsistency differs between these lan- eme (phoneme) maps unambiguously to a specific pho- guages and within each language between the feedfor- neme (grapheme), the mapping is absolutely certain (i.e., ward and feedback directions. From the spelling point of there is no ambiguity); hence, the corresponding entropy view, both English and French can be described as opaque is zero. Entropy is high for graphemes (phonemes) with languages, since 79.1% of the French words and 72.3% many alternative pronunciations (spellings), especially of the English words were feedback inconsistent. From when there is not a single dominant mapping. the reading perspective, 12.4% of the French words and For any unit of orthographic (or phonological) repre- 30.7% of the English words were feedforward inconsis- sentation that maps onto n phonological (orthographic) tent. Therefore, both languages are more consistent in the alternatives with probability pi for the ith alternative, its feedforward direction, but the asymmetry is much higher entropy (H) is calculated as the negative sum, over the for French. alternative mappings, of the products of each probability A severe limitation of the aforementioned studies is the times its logarithm: computation of consistency values solely on the basis of n monosyllabic words. Consistency measures based on a Hpp £ iilog2 . nonrandom subset of words in a language may not con- i1 stitute reliable estimates of the sound–spelling relations Using base-2 logarithms for the calculation results in a of this language as a whole, insofar as polysyllables may quantification expressed in bits. For example, the entropy have a different structure from monosyllables or may of the [D] phoneme mentioned above would be [0.855  be otherwise biased (Borgwaldt et al., 2004; Kessler & log (0.855) 0.145 log (0.145)] 0.597 bits. To cal- 2   2  Treiman, 2001; Lété et al., 2008). Moreover, restriction culate the total entropy associated with an entire set of of computations to monosyllables limits the potential for units (letters, graphemes, or phonemes), the contribution cross-linguistic comparisons, because languages differ of each individual unit is weighted by its relative frequency in the proportion of words that are monosyllables and, of occurrence before it is added to the total. Calculated in possibly, in the representativeness of monosyllables with the same way, the entropy of [!] equals 0.827 bits. respect to the full spectrum of orthographic mappings. Borgwaldt et al. (2004, 2005) performed analyses of Finally, analyses at the rime–body level may be justified entropy for English, Dutch, German, French, Hungar- only for languages such as English, where smaller grain ian, Italian, and Portuguese. To overcome the limitations sizes would lead to very low consistency estimates. In more of restricting analysis to monosyllables, they focused transparent orthographies, grapheme–phoneme mappings on word-initial sound–spelling correspondences, using are more consistent across the entire vocabulary, high- all the words in each language. The restriction to word- lighting the importance of selecting a cross-linguistically initial mappings was dictated by severe constraints on appropriate level of analysis. These issues are addressed defining and segmenting the units of analysis in a practi- in the present study by a comparative analysis at different cal and cross-linguistically uniform manner, due to dif- grain sizes, using an unabridged lexical database. ficulties in identifying the borders of graphemes within Entropy. The calculation of consistency as the propor- words, at least in some languages. Borgwaldt et al. showed tion of majority mappings suffers from the inability to that none of the orthographies examined approached the discriminate between cases with many and few alterna- ideal one-to-one mapping between letters and sounds and tives and between nondominant mappings with substan- that word-initial letter entropy is significantly correlated tial and negligible proportions. For example, in Greek, with word-naming latency in Italian, Dutch, and English, the phoneme [D]2 can be spelled as either ,o€- (85.5%) three languages varying widely in orthographic transpar- or ,oo- (14.5%). The phoneme [!] can be spelled as ency. However, it remains to be empirically substantiated ,O- (85.0%), ,›- (7.0%), ,- (6.9%), or ,e-, ,O-, ,Oe-, whether word- initial mappings constitute an unbiased rep- and other combinations with very low probability (less resentative sample of all mappings. This is an important than 1% each). According to the consistency index, these issue that we will address empirically in the present study. phonemes are equally consistent at about 85%. However, Type versus token counts. An issue warranting fur- their mappings are not equally unpredictable, because ther scrutiny concerns the nature of the counts entering there is only one significant nondominant option for [D] the calculations of transparency indices. In principle, but two for [!] (among several minor ones). The consis- consistency counts of sublexical units can be performed tency index cannot express this difference in the ambigu- on the basis of either word form or lemma databases and ity of mapping between the two phonemes. on either type or token frequency counts. The difference This shortcoming can be remedied by resorting to en- in database contents concerns items related via inflec- tropy, a more sophisticated index of consistency, which tional morphology. That is, word form databases contain assesses the same underlying concept but can take into morphological variants of the same lemma, whereas account the complications arising from the entire distri- lemma databases contain only a “base” form. Type counts bution of mappings, and not only the dominant ones. En- of sublexical units consist in the number of items (word 994 PROTOPAPAS AND VLAHOU forms or lemmas) that contain the unit in question. Token and two ways of spelling [#] (,e- and the ,(-); counts are calculated by summing the frequency (number [R] is spelled with the digraph ,›Ú-. There are six ways to of occurrences in a text corpus) of the items that contain spell [F] (,-, ,h-, ,Ú-, ,e-, ,›-, ,Ú-). Since there are fewer the unit. consonant letters than phonemes, several consonants are The optimal choice between lemmata and word forms spelled with digraphs. For example, all voiced stops are depends on the researcher’s theoretical assumptions and spelled with a combination of the letters for the corre- research goals (Hofmann, Stenneken, Conrad, & Jacobs, sponding unvoiced stop and the nasal at the same place 2007). However, in lemma databases, the sublexical units of articulation: [J]%,Ž-, [M]%,®-, [?]%,Ž®-. Palatal of the base form that are not present in the inflected forms consonants are spelled with the letter for the correspond- are overestimated, whereas units associated with inflec- ing and one of the [F] graphemes. Despite tions are underestimated. This led Hofmann et al. to rec- these and other complications, learning to read Greek ommend using word form databases, especially when as- is considered to be relatively easy, and online resources sessing language in its natural, inflected form. are available providing the necessary information (e.g., The choice between type and token measures remains “The ” at www.xanthi.ilsp.gr/filog/ch1/ controversial, since both seem to be independently as- alphabet/alphabet.asp). sociated with lexical processing. Conrad, Carreiras, and The Greek orthography is commonly characterized as Jacobs (2008) showed that a token-based measure of transparent or shallow, despite a dearth of relevant quan- syllable frequency was associated with an inhibitory ef- titative data. In the classification of several European or- fect on lexical access in a lexical decision task, whereas thographic systems by Seymour, Aro, and Erskine (2003), a type-based measure was associated with a facilitative in the context of a cross-linguistic study on early stages of effect. To account for the contradictory findings, Conrad learning to read, Greek occupied the second position, in et al. proposed that the two kinds of measures are related order of decreasing orthogaphic transparency for reading, to different processing stages during visual word recogni- in the group of languages with relatively simple syllabic tion. In contrast, Moscoso del Prado Martín, Ernestus, and structures, after Finnish. However, there seems to be a great Baayen (2004) have proposed that a common, token-based asymmetry in the transparency of Greek orthography be- mechanism can account for both token- and type-based tween the feedforward (reading) and feedback (spelling) di- effects. They modeled Dutch past tense formation with rections. According to Porpodas (2006), in the feedforward a simple recurrent network that exhibited token-based direction “the Greek spelling system . . . [approaches] a 1:1 frequency effects and type-based analogical effects that relationship between graphemes and phonemes . . . and can closely matched the behavior of human participants. be characterized as a shallow orthography in which, as a In the aforementioned studies employing entropy, rule, pronunciation is predictable from print,” whereas in Borgwaldt et al. (2004) used word form types, whereas the feedback direction, “Greek is phonologically opaque Borgwaldt et al. (2005) used lemma types. Neither op- as there is a one-to-many phoneme– grapheme mapping tion corresponds to the cumulative experience of read- and therefore spelling cannot always be predictable from ers and spellers with the graphophonemic mappings, be- phonology” (p. 192, emphasis in original). cause the frequency of occurrence of each word in written Consistent with the notion of rule-based predictability, and spoken language was not taken into account. Insofar Petrounias (2002) has listed a set of rules for each direc- as the frequency of occurrence of a word is among the tion of conversion. However, according to Petrounias, strongest predictors of how quickly it can be recognized some of the rules apply only to words of the vernacular or read aloud (Balota, Yap, & Cortese, 2006), it might be and are often violated by words of literary or learned ori- preferable to measure indices of transparency in terms of gin. Since the origins of each word are not necessarily tokens instead of type counts, using word forms weighted clear to contemporary Greek speakers, this diachronically by the number of their occurrences in a representative text systematic distinction constitutes a source of synchronic or speech corpus. A more conservative approach would inconsistency. Petrounias lists several cases of mappings take into account both type and token frequency counts, between phonemes and letters, which can be generally following the recommendation of Hofmann et al. (2007). classified into one-to-one, one-to-many, and many-to- one. Deviations from one-to-one include both digraphs/ The Greek Orthography diphones (i.e., phonemes spelled with two or more let- There are 32 phonemes in at the level of ters, such as [?]%,Ž®-, or single letters pronounced as broad phonetic transcription (discounting idiolectal and two phonemes, such as ,Ý-%[HP]) and context-dependent optional variation), of which 5 are vowels. These are writ- transcriptions—that is, phonemes spelled differently de- ten with 24 letters (plus one final-only form), of which pending on adjacent phonemes (e.g., [W]%,À- before [S] 7 correspond to vowels in isolation. Most words can be vs. [W]%,ß- before [L]) or letters pronounced differently read correctly on the basis of the letter sequence alone, depending on adjacent letters (e.g., ,€-%[H] before ,›- vs. without the need for morphological or lexical informa- ,€-%[@] before ,e-). tion. However, spelling is more complicated, because it is Once the assumption of rules is made, the notion of determined not only by phonological identity, but also by regularity becomes relevant. Exceptions are possible only morphological type for grammatical inflections, and by when rules are defined to which they do not conform. In the historical origin for word stems tracing back to , there are some clear exceptions—typically, recent Greek. There are two letters for the vowel [L] (,›-, ,™-) loans—that violate the rules and are pronounced simi- ORTHOGRAPHIC TRANSPARENCY IN GREEK 995 larly to their foreign origin (at least by educated speak- graphemes from digraphs, so ,e[  - and ,e[ - are also bigra- ers). For example, Greek has no way to spell [JM] or phemic, pronounced [BF]. [KQ] (and these clusters do not occur within native words), In conclusion, there is a certain degree of complexity because, as was noted above, the corresponding letter and some inconsistency in Greek spelling at the level of combinations (,Ž- for [J] and ,®- for [M]; ,Ú- for [K] individual letters and phonemes, in both directions. In the and ,É- for [Q]) are used as digraphs for the voiced stops. present study, we quantify it, comparing and contrasting Therefore, words such as À(Ž®(Œ Ú( (champagne) and results from calculations of regularity, consistency, and €(Œ Ž®Úo€ (camping), which have entered the vocabu- entropy, using type and token counts from a word form lary recently, are properly pronounced [P>JM>g>] and list derived from a representative corpus of contempo- [H>JMFD], respectively, in violation of the grapheme-to- rary written Greek texts. Because of the characteristics phoneme rules. of Greek spelling, it is possible to calculate these indices The most pervasive issue of inconsistency and irregu- for all words, regardless of their length, and for all letters larity in the feedforward direction concerns the general and phonemes of every word. In this way, we can address phenomenon of CiV—that is, the occurrence of an [F] shortcomings of previous studies. Specifically, by apply- grapheme preceded by a consonant and followed by a ing all three approaches, we can examine whether differ- vowel. In every such case, there are two possible pronun- ent metrics of transparency may lead to similar conclu- ciations: one that includes the [F] and one that includes a sions and predictions. Most important, we can critically palatal consonant and no [F]. In the former case, the CiV is assess the validity of assumptions that have led research- parsed into three graphemes (C, i, V) and the i grapheme ers to perform their analyses on restricted sets of words is indeed pronounced [F], as in hŒ› (helium) pronounced (e.g., monosyllables) or parts thereof (e.g., word begin- [FIFL] and (Œ Ye( ( permission) pronounced [>"F>]. In the nings or rimes). latter case, the CiV is parsed either into two graphemes (Ci, V), the Ci part corresponding to the palatal conso- METHOD nant, as in hŒ›Å(sun) pronounced [FªL], or into three graphemes (C, i, V), in which case the palatal conso- Text Corpus nant actually corresponds to the i grapheme, as in (Œ Ye( All analyses and counts were performed on a word list (empty) pronounced [>"›>]. Note that, although homo- derived in 2006 from the Hellenic National Corpus (HNC; graphs are used in this example to make the point most Hatzigeorgiu et al., 2000; http://hnc.ilsp.gr). This is an clearly, there are in fact very few homographs of this sort. evolving corpus of a great variety of post-1990 widely For the vast majority of letter strings, there is only one cor- circulated printed Greek texts, including literary, jour- rect (i.e., word-forming) parsing of the CiV. Compare, for nalistic, legal, and other texts from online news sources, example, —É›Œ®›%[ALM!L] versus ›Œ®›%[LMFL] and newspapers, books, magazines, reports, proceedings, ßhŒe(%[WFª>] versus ÉeŒe(%[Q#IF>]. and brochures. The raw texts available at the time were Greek orthography marks lexical stress with a special tokenized into 31,363,642 white--separated tokens . As was noted by Petrounias (2002), the cur- and condensed into a list of 374,075 unique types with rent spelling convention for Greek regarding the stress associated occurrence counts (frequency). Items (0.4% diacritic contains an element of inconsistency, because its of tokens) including any characters, numerals, or application depends on the number of syllables and not symbols were rejected, as were items (5.3% of tokens) only on the presence of phonological stress. Specifically, not found in an electronic dictionary with 1,622,668 monosyllables do not bear a diacritic, whereas every word entries covering all possible morphological variants with two or more syllables must bear a diacritic. This is of inflected words (“Symfonia”; Stathis & Carayannis, phonologically appropriate in the majority of cases, be- 1999; www.ilsp.gr/correct_eng.html), resulting in a list cause most monosyllables are grammatical words that of 217,664 unique word forms (types) accounting for a attach themselves metrically to adjacent content words. total of 29,557,090 occurrences (tokens). This word list Conversely, most polysyllables are content words and bear was relatively free from spelling errors and contained phonological stress, which is always correctly marked few idiosyncratic items, such as proper names, foreign with the diacritic. However, there are exceptions: Mono- words not quite integrated as loans in the , syllabic content words, which bear phonological stress, or very low frequency words unlikely to be found in the are not marked with the diacritic due to the spelling con- dictionary. vention, whereas disyllabic function words, which do not Of the 24 letters in the Greek alphabet, seven (the vowel bear phonological stress, are written with a diacritic none- letters) have variants bearing . Specifically, all theless (see Petrounias, 2002, pp. 533–534). seven may be accompanied by an , indicat- In addition to its metrical significance, the stress dia- ing stress. Two of these may carry , indicating critic sometimes helps disambiguate graphophonemic exception from digraph combinations. Because both types mappings, because of spelling conventions concerning of diacritics are useful in phonological or lexical disam- vowel digraphs. For example, ,e- and ,eŒ- constitute biguation, and because they are dictated by current spell- graphemes and are pronounced [F], whereas ,eŒ- in- ing rules and their omission is always a spelling error, the cludes two graphemes and is pronounced [BF]. A diaeresis variants of these letters with diacritics (stress mark only, diacritic is also available, to disambiguate single-vowel diaeresis only, or both) were retained in the counts as sepa- 996 PROTOPAPAS AND VLAHOU rate letters. Including the word-final variant Á, a total of 36 ›ÅÅÅ;śżÅÅ(Œ ÁÅÅŀśÅÅÅhŒÅśÅÁÅÅŎÅ(Œ ŁřŗÅ(ŗÅÅŮśśÅÁ letters were used in the analyses. LSÅLÅ/ś>P@LFŪÅLÅPJÅ>ÅIÅLÅKÅ>ÅKMÅ!ÅLÅP Phonetic Transcription and Postprocessing A custom software processed the orthographic and The list of orthographic types from the HNC was pro- phonetic types lists with a greedy assignment algorithm cessed by a module producing phonetic representations using an expanded list of possible phoneme–grapheme of words that was developed for a Text-to-Speech project mappings originally based on Petrounias (2002, (Chalamandaris, Raptis, & Tsiakoulis, 2005). This module Table 15.2, pp. 498–502). In the vast majority of cases, has been extensively validated, is known to produce highly simply assigning the longest letter sequence matching accurate results, and has been in commercial and research the current phoneme resulted in correct parsing (i.e., one use for several years. The resulting list of phonological that accounted for all phonemes with graphemes in the types, corresponding to the HNC orthographic types, was matching list and accounting for all letters in the ortho- postprocessed to maximize uniformity by simplifying op- graphic string). A few special cases were identified and tional pronunciations that might result in unnecessary, and treated separately, such as two-phoneme letters (,Ý- and potentially misleading, complexity. Specifically, (1) all ,µ-) and context- dependent palatal allophonic variants homorganic nasal obstruents preceding voiced stops were of [F], without affecting the strictly sequential principle removed (e.g.,[J?]%[?], [KA]%[A], etc.); (2) all [JMQ] of alignment. sequences were reduced to [JQ]; and (3) all instances of [€] were converted to [J]. These cases concern optional RESULTS alternative pronunciations (phonologically and lexically nondistinctive), with variants freely alternating not only Grapheme–Phoneme Consistency and Entropy between dialects but also within dialects and talkers as a There were 118 unique grapheme–phoneme map- matter of sociolinguistic context or careful versus relaxed pings (sonographs, in the terminology of Spencer, 2009) articulation. In no instance were the simplified versions accounting for the 147,398,522 (frequency-weighted) used in the subsequent analyses inappropriate, unusual, or phoneme– grapheme pairs in the complete corpus. Addi- otherwise marked. The resulting set of 32 phonemes, 5 of tional grapheme–phoneme pairs are possible, are phono- which were vowels, suffices to accurately and completely tactically and orthographically allowed, and may possibly represent phonetically (broadly, at the surface realization) occur in very low frequency or loan words not included every Greek word in standard modern pronunciation typi- in this corpus. Table 1 shows the occurring mappings, cal of major cities such as Athens. To retain stress infor- grouped and counted by phoneme, and the proportion of mation, aiding in disambiguation, stressed vowels were occurrence for each grapheme (over the total count of the represented as separate phonemes, bringing the total num- corresponding phoneme). The proportion of the most fre- ber of phonemes to 37. quent grapheme for each phoneme is displayed first, in a In addition, all types containing CiV sequences were separate column to the left of the smaller proportions fol- identified and submitted to manual verification. The CiV lowing it. The token sum of the most frequent grapheme for pattern was found in 17.9% of the corpus types, amounting each phoneme divided by the total number of grapheme– to 6.6% of the tokens. A custom software presented each phoneme pairs in the corpus is .803. To the extent that this of the 38,926 orthographic CiV types individually while ratio can be considered to be a single-number estimate of simultaneously playing out a synthesized pronunciation of the consistency of phoneme-to-grapheme mapping, Greek the phonetic string derived from the grapheme-to- phoneme is then 80.3% consistent in the feedback (spelling) direc- module. A listener indicated manually any errors in the tion by token count. transcription. Types for which both alternative pronuncia- Table 2 shows the same mappings, grouped and counted tions were acceptable (e.g., ,Y(Œ ›o›Á-%["F>IL–LP] or by grapheme, with the corresponding proportions now re- ["›>IL–LP]) were not modified. As a result of this proce- ferring to sums over graphemes. By a similar calculation of dure, a revised list of phonetic types was generated. an estimate for the consistency of grapheme-to-phoneme mapping, Greek is 95.1% consistent in the feedforward Grapheme Alignment (reading) direction. The grapheme unit size was selected Minimal experimentation revealed that it was always pos- as most appropriate because the corresponding calcula- sible to align the set of phonemes making up each phonetic tion, using single letters instead of graphemes, resulted in type with a set of graphemes making up the corresponding a very substantially lower consistency estimate (80.3%) orthographic type, such that all phonemes and all letters and a greater number of mappings (173) in the reading were appropriately matched and none were left unassigned direction. Table 3 lists these and corresponding estimates (no null phonemes or null graphemes). In other words, a derived from type counts. Ignoring the stress diacritic and strictly sequential grapheme-to-phoneme alignment is pos- treating stressed and unstressed letters (and phonemes) sible for Greek, fully accounting for all letters and pho- as identical would result in 88 grapheme–phoneme map- nemes, with the limitation that, because processing applies pings with an overall token consistency of 96.0% in the to individual word units, sandhi is effectively ignored. For feedforward and 80.8% in the feedback direction. example, here is the beginning of “the northern wind and Entropy values calculated following Borgwaldt et al. the sun” fable, aligned at the grapheme–phoneme level: (2004, p. 171) are listed in Table 4 under “Type Counts.” ORTHOGRAPHIC TRANSPARENCY IN GREEK 997

Table 1 Phoneme-to-Grapheme Mappings, Grouped by Phoneme and Sorted by Within-Phoneme Proportions Pair Proportion (%) Pair Proportion (%) Phoneme F (%) Grapheme Highest Other Phoneme F (%) Grapheme Highest Other > 8.26 ( 100.0 —— 1.5 ? 0.18 Ž® 100.0 —e 0.11 @ 1.96 € 97.2 L 6.87 › 76.7 € 2.6 ™ 23.3 €€ 0.17 M 4.22 ® 99.9 €Ú 0.02 ®® 0.05 €e 0.0006 MP 0.16 µ 100.0 A 0.56 —É 100.0 / 4.59 ¼ 99.5 " 1.87 Y 100.0 ¼¼ 0.48 # 6.71 e 78.0 P 8.34 À 54.6 ( 22.0 Á 44.8 C 1.35 ­ 66.5 ÀÀ 0.57 Ú 28.6 ƴ 0.04 ÉÀ 89.4 ڌ 4.8 ÉÁ 10.6 Ú­ 0.06 Q 8.43 É 99.9 ڌ­ 0.01 ÉÉ 0.14 D 0.08 o€ 85.5 2 1.27 Ì 100.0 oo 14.5 R 2.14 ›Ú 100.0 ( 0.07 o€ 57.9 S 0.82 ; 77.5 oo 41.6 Ú 14.5 o€ 0.38 ڌ 7.9 oo 0.09 ;; 0.03  0.03 o 95.1 Ú; 0.004 — 4.9 U 0.74 O 100.0 F 10.77 h 39.1 ! 0.67 O 85.0  33.6 › 7.0 e 10.8  6.9 Ú 10.7 e 0.60 › 5.3 O 0.39 [ 0.41 Oe 0.09 [Ú 0.09 Ú 0.08 Ú 0.02 OÚ 0.001 › 0.72 o 71.5 W 0.59 ß 57.1  22.1 À 42.9 o 6.1 AW 0.01 Éß 92.9 e 0.17 —Éß 7.1 oÚ 0.11 > 2.21 (Œ 95.2 Ú 0.04 ( 4.8 oe 0.01 # 2.60 eŒ 79.6 – 0.78 o 99.9 e 12.8 oo 0.07 (Œ 7.4 H 2.16 € 99.6 ( 0.15 €€ 0.38 F 4.40 hŒ 33.7 HP 0.47 Ý 99.1 Œ 29.7 I 2.43  86.7 eŒ 20.1  13.3 ڌ 9.9 ª 0.04  69.6 ›Œ 4.3 e 23.1  0.91  7.3 h 0.70 J 3.34 Ž 96.8 e 0.57 ŽŽ 3.0 [Œ 0.05 Ž® 0.19 Ú[ Œ 0.0005 K 6.25 — 99.8 ÚŒ 0.0003 —— 0.23 L 3.29 ›Œ 72.7 g 0.05 — 72.9 ™Œ 24.7  16.6 ™ 1.5 Ú 3.7 › 1.0 › 3.6 R 0.54 ›ÚŒ 99.4 —› 1.6 ›Ú 0.58 Note—Each line refers to a single phoneme–grapheme mapping in the corpus. F, relative frequency (percentage of oc- currences of all phoneme tokens) of this phoneme in the corpus; pair proportion, percentage of this phoneme–grapheme pair as a proportion of all occurrences (tokens) of this phoneme. The proportion of the dominant mapping is listed first, on the left; other mappings follow, on the right. A minimum of one significant digit is shown. Phoneme pairs [HP] and [MP] appear under “Phoneme” because they map to single letters ,Ý- and ,µ-, respectively. This results in a slight over- estimation of consistency for [H], [M], and [P]. 998 PROTOPAPAS AND VLAHOU

Table 2 Grapheme-to-Phoneme Mappings, Grouped by Grapheme and Sorted by Within-Grapheme Proportions Pair Proportion (%) Pair Proportion (%) Grapheme F (%) Phoneme Highest Other Grapheme F (%) Phoneme Highest Other ( 8.37 > 98.7 — 6.23 K 100.0 > 1.3  0.03 (Œ 2.11 > 100.0 —e 0.00 g 100.0 ( 1.48 # 99.7 — 0.04 g 100.0 # 0.3 —— 0.01 K 100.0 (Œ 0.19 # 100.0 —— 0.00 g 100.0 ; 0.64 S 100.0 —› 0.00 g 100.0 ;; 0.00 S 100.0 —É 0.56 A 100.0 o 1.33 – 58.7 —Éß 0.00 AW 100.0 › 38.9 Ý 0.47 HP 100.0  2.4 › 5.30 L 99.4 oo 0.04 ( 72.3 L 0.6 D 26.5 ›Œ 2.39 L 100.0 – 1.2 › 0.62 F 92.1 oo 0.00 ( 100.0 ! 7.5 oe 0.00 › 100.0 g 0.3 o 0.04 › 100.0 ›Œ 0.19 F 100.0 o€ 0.11 D 60.8 ›Ú 2.14 R 99.9 ( 39.2 R 0.1 o€ 0.00 ( 100.0 ›ÚŒ 0.53 R 100.0 oÚ 0.00 › 100.0 ® 4.22 M 100.0 Y 1.87 " 100.0 ®® 0.00 M 100.0 e 5.56 # 94.0 ¼ 4.56 / 100.0 # 6.0 ¼¼ 0.02 / 100.0 eŒ 2.07 # 100.0 À 4.81 P 94.7 e 1.19 F 97.4 W 5.3 F 2.1 Á 3.74 P 100.0 ! 0.3 ÀÀ 0.05 P 100.0 › 0.1 É 8.41 Q 100.0 eŒ 0.89 F 100.0 Éß 0.01 AW 100.0 ß 0.34 W 100.0 ÉÀ 0.04 ƴ 100.0 h 4.24 F 99.3 ÉÁ 0.00 ƴ 100.0 F 0.7 ÉÉ 0.01 Q 100.0 hŒ 1.48 F 100.0 Ú 1.67 F 69.4 Ì 1.27 2 100.0 C 23.3  3.87 F 93.4 S 7.2 › 4.1 g 0.1 ! 1.2 ! 0.03 F 1.0 › 0.02 g 0.2 ڌ 0.57 F 76.9 Œ 1.31 F 100.0 C 11.6 [ 0.04 F 100.0 S 11.5 [Œ 0.00 F 100.0 Ú[ 0.01 F 100.0 € 4.05 H 53.0 Ú[ Œ 0.00 F 100.0 @ 47.0 Ú; 0.00 S 100.0 €e 0.00 @ 100.0 Ú 0.00 F 100.0 € 0.05 @ 100.0 ÚŒ 0.00 F 100.0 €€ 0.01 H 70.9 Ú­ 0.00 C 100.0 @ 29.1 ڌ­ 0.00 C 100.0 €Ú 0.00 @ 100.0 ­ 0.90 C 100.0  2.10 I 100.0 O 1.31 U 56.4 e 0.01 ª 100.0 ! 43.6  0.03 ª 100.0 Oe 0.00 ! 100.0  0.32 I 100.0 O 0.00 ! 100.0  0.00 ª 100.0 OÚ 0.00 ! 100.0 Ž 3.24 J 100.0 µ 0.16 MP 100.0 ŽŽ 0.10 J 100.0 ™ 1.65 L 97.0 Ž® 0.18 ? 96.6 L 3.0 J 3.4 ™Œ 0.81 L 100.0 Note—Each line refers to a single grapheme–phoneme mapping. F, relative frequency (percentage of occurrences of all grapheme tokens) of this grapheme in the corpus; pair proportion, percentage of this grapheme–phoneme pair as a proportion of all occurrences (tokens) of this grapheme. The proportion of the dominant mapping is listed first, on the left; other mappings follow, on the right. A minimum of one significant digit is shown. Phoneme pairs [HP] and [MP] appear under “Phoneme” because single letters ,Ý- and ,µ-, respectively, map to them. ORTHOGRAPHIC TRANSPARENCY IN GREEK 999

Table 3 Statistics Related to the Transparency of Greek Orthography Type Counts Token Counts Total Total Mapping Total Mean Type– Consistency Entropy– Consistency Entropy– From To Pairs Pairs Token (%) Consistency V:C (%) Consistency V:C Grapheme Phoneme 118 1.40 .93 95.7 .991 0.873 95.1 .986 0.915 Letter Gra/Phoneme 173 4.81 .91 82.5 .931 0.930 80.3 .933 1.019 First letter Gra/Phoneme 64 2.06 .81 93.8 .963 0.567 90.9 .971 0.481 First letter Phoneme 54 1.74 .78 93.8 .954 0.567 91.3 .975 0.481 Phoneme Grapheme 118 3.03 .93 82.9 .824 0.873 80.3 .810 0.915 First phoneme Grapheme 64 1.78 .85 93.3 .913 0.567 93.8 .948 0.481 First phoneme Letter 54 1.46 .78 93.5 .976 0.567 94.3 .992 0.481 Note—There are fewer mappings from first letter to phoneme than from first letter to grapheme–phoneme pairs, because the same letter may map onto the same phoneme as a member of different graphemes. Mean pairs, average number of mappings from a single source unit (letter, grapheme, or phoneme); type–token and entropy–consistency, correlation coefficients calculated as Spearman’s ¼ between counts/estimates for the corre- sponding units of each mapping (N  total pairs); V:C, ratio of vowel to consonant phonemes; Gra/Phoneme, unique grapheme–phoneme pairs.

In addition, entropy values under “Token Counts” were related letters or letter combinations, specified in the Ap- calculated using the frequency-weighted word list as in pendix, Table A2. These sets indicate possible (not actual) the calculations of consistency above. Because of the dif- combinations; full expansion of the group rules using the ficulty in defining graphemes in less transparent orthog- complete letter sets indicated in Table A2 results in 4,163 raphies, Borgwaldt et al. (2004) used only word-initial individual rules, of which only 525 apply at least once in mappings. Therefore, we also list calculations based on the analyzed corpus. Obligatory preceding and following word-initial mappings, for comparison. Table 4 also lists contexts for the rules are fully indicated in Table A1. entropy for vowels and consonants separately, following A number of rules are marked as “optional” because of Borgwaldt et al. (2005; except that lemma type counts the ambiguous CiV sequences, which can be parsed in two were not available for our corpus, so word form type and ways. Specifically, because the cases with palatal conso- token counts were used instead). The distinction between nants are more specific to the CiV phenomenon, whereas consonants and vowels was always made on the basis of the [F] case amounts simply to pronouncing each compo- the phonemes (not letters) for each individual mapping, nent of the CiV as it would be pronounced in other contexts, so that a given letter might be counted as a vowel in one the rule set in Table A1 lists the palatal rules as special case and as a consonant in another (e.g., ,Ú-%[R] vs. cases, having precedence over the more general mappings. ,Ú-%[S]). Rules 8, 10, 23, 24, 29, 30, 41, 42, and 67 correspond to the two-grapheme parsings of the CiV, in which the Ci to- Regularity of Graphophonemic Conversion gether map onto a palatal consonant. Rules 17, 73, and 74 Table A1 in the Appendix lists an ordered set of 80 rules correspond to the three-grapheme parsings, in which the (originally based on the set of mappings in Petrounias, 2002, “unstressed i” grapheme alone (set U in Tables A1 and A2) pp. 498–502) that can transcribe correctly the complete text maps onto a palatal consonant. These special rules are corpus on the basis of the word form letter sequences only, listed as optional because the correct pronunciation, being without any additional information. Because of potential lexically determined, cannot be derived from orthographic overlap or ambiguity, rules are ranked in fixed order such or phonological information at the grapheme–phoneme that more specific rules take precedence over more gen- level. In the discussion, the term CiV rules refers to (op- eral ones (nonoverlapping rules, for which rank does not tional) rules that lead to a palatal consonant. matter, are listed in Greek alphabetical order). Many rules The number of times each rule actually applies and leads are actually group rules, in that each applies over a set of to a correct pronunciation, divided by the number of times

Table 4 Entropy Values for the Greek Othography, in Both Mapping Directions, for All Entire Words and for Word-Initial Units Only Mapping Type Counts Token Counts From To Total Vowels Consonants Total Vowels Consonants Grapheme Phoneme .163 .033 .198 .167 .085 .177 Letter Gra/Phoneme .786 .817 .643 .801 .977 .515 First letter Gra/Phoneme .290 .271 .301 .330 .520 .239 First letter Phoneme .275 .265 .282 .308 .513 .209 Phoneme Grapheme .589 .886 .330 .645 1.010 .311 First phoneme Grapheme .277 .406 .203 .251 .626 .071 First phoneme Letter .262 .400 .184 .229 .619 .041 Note—Calculations were done on both the type counts (unique word forms) and the token counts (fre- quency weighted). Gra/Phoneme, unique grapheme–phoneme pairs. 1000 PROTOPAPAS AND VLAHOU the rule should apply according to the matching criteria tokens, or 93.3% of the monosyllabic tokens and 35.0% and its rank in the rule set, constitutes an estimate of the of the corpus. Therefore, as far as monosyllables are con- regularity for the rule and is listed in Table A1, computed cerned, the lack of diacritic demanded by spelling con- on token counts. Regularity estimates lower than 1.00 are vention coincides with a lack of phonological stress at an caused by (1) an optional rule’s taking precedence and, estimated rate of more than 90%. Thus, in the reading di- thus, precluding application of a nonoptional but lower rection, with respect to stress, the diacritic does not seem ranked alternative, (2) phonologically stressed syllables to dramatically affect the overall consistency estimate, as not orthographically marked with a diacritic due to the would be expected from the small effect on regularity for monosyllable rule (Rules 5, 71, 75, 77, and 79; see the sec- the vowel rules noted above. tion on stress and monosyllables below), and (3) clear-cut cases of exceptions, such as [JM] and [KQ] in recent loans DISCUSSION (the latter are quite rare, accounting for 0.03%–0.56% of the corresponding digraph rules). Greek orthography is sufficiently consistent that we At the word level, regularity can be calculated as the have been able to segment and analyze a complete cor- proportion of words read correctly on the basis of their pus sequentially into graphemes and their relation to pho- orthography alone. A word is considered correct when all nemes. Our analyses indicate that the grapheme level is of its phonemes are correctly mapped. When the optional appropriate for expressing the widely perceived notion rules are included in the rule set, word-level regularity that Greek is simpler to read than to write, because it is is 92.7% (by token count). When the optional rules are at this level that consistency and entropy estimates come removed from the rule set, word-level regularity is 95.3%. out strongly asymmetric in favor of the feedforward di- Finally, when optional rules are allowed to apply option- rection. Specifically, in the feedforward direction, con- ally, with either outcome counting as correct, the word- sistency estimates for individual graphemes range from level regularity estimate reaches 97.3%. The latter value 53.0% for ,€- and 56.4% for ,O- to 100.0% for the major- surely overestimates the regularity of the system because ity of graphemes. In the feedback direction, consistency it sidesteps the whole irregularity problem raised by the estimates for individual phonemes range from 39.1% existence of the “optional” rules. However, it is useful to for ,i- (33.7% for stressed ,i-) to 100.0% for a small keep in mind that there is a certain order even in these minority of phonemes. A nonparametric comparison of cases, because the ambiguity is always between two well- the 84 grapheme consistency estimates with the 39 pho- defined alternatives and not totally unpredictable, as it neme consistency estimates confirms that the asymmetry might be in more opaque orthographies. is significant (Mann–Whitney U  761.5, Z  5.33, asymptotic two-tailed p .0005; the same result is ob- Stress and Monosyllables tained if consistency estimates are weighted by unit token It is not possible to distinguish which disyllabic tokens frequency: U  990.0, Z  3.52, p .0005). It was also in the corpus bear phonological stress without examin- clear in the overall counts, where spelling inconsistency ing the phrase context of each individual occurrence. It is, (proportion of nondominant mappings) at the grapheme– however, possible to reach an approximate estimate for the phoneme level (19.7%) was four times as large as reading monosyllables, because examination of a sample of oc- inconsistency (4.9%). currences indicates, in the vast majority of cases, whether A factor frequently overlooked in orthographic analy- a stressed or unstressed reading strongly predominates. ses and in reading models concerns suprasegmental fea- Therefore, to estimate the effect of inconsistencies in ap- tures and their orthographic notation—namely, stress and plication of the orthographic stress diacritic relative to the metrical patterns and the corresponding diacritics that phonological stress, we classified all monosyllables in the signify them. We have confirmed that the Greek stress corpus according to whether they bear phonological stress diacritic constitutes an additional source of orthographic or not (the cumulative frequency of unclassifiable mono- information, contributing somewhat to overall inconsis- syllable types was negligible). tency at the metrical level, due to the peculiarities of the There were 466 monosyllabic types (0.2%) accounting spelling rules. Further research in stress-assigning lan- for 37.6% of the total token count (11,108,247 tokens). Of guages should take stress and the diacritics into account, these 466 word forms, we identified 352 (79.6%) bearing aiming for a full analysis of the orthographic system at phonological stress, including 94 native Greek words, 189 multiple levels. Here, the presence of the diacritic does recent loans (mainly from English, such as bar, goal, etc.), not reduce inconsistency at the grapheme–phoneme level, 35 exclamatory and onomatopoetic items, some stress- because ignoring it results in fewer grapheme–phoneme bearing function words, and more than 50 abbreviations pairs and because its disambiguating role becomes ap- either retaining the stressed syllable of the original or parent in grapheme segmentation, not in mapping to and having been lexicalized. The total token count of these from phonemes. Therefore, any beneficial effects of the types was 743,694—that is, 6.7% of the total monosyl- diacritic would be discernible only in predictability (or lable token count and 2.5% of the corpus. There were also regularity) and not in consistency. 90 types not bearing phonological stress, primarily func- In the following, we will compare our findings with those tion words but also including some abbreviations retain- from different orthographies and methodologies, discuss- ing an unstressed syllable; these accounted for 10,341,231 ing the advantages and limitations of different approaches. ORTHOGRAPHIC TRANSPARENCY IN GREEK 1001

In addition, we will consider two special issues of particu- mappings are representative of the full words in every lar importance—namely, the consistency– predictability language. distinction and the status of the CiV ambiguity. Representativeness of Analyzed Words Cross-Linguistic Comparisons or Word Parts The regularity of graphophonemic conversion at the Previous studies of orthographic transparency have word level (95.3%, excluding the “optional” rules) was usually analyzed a small subset of all mappings that occur found to be higher for Greek than for German (90.4%) in the written language. A common restriction seen in the or English (79.3%), as reported by Ziegler et al. (2000). literature is to analyze monosyllables only (Ziegler et al., However, these numbers may not be directly comparable, 1996; Ziegler et al., 2000; Ziegler et al., 1997). Our results because German and English estimates were derived on suggest that although this approach may be justified for the basis of monosyllables only. some languages, such as English, it is not appropriate for The overall consistency for Greek at the grapheme– cross-linguistic comparisons. phoneme level was 95.1% in the feedforward and 80.3% Specifically, Greek has very few monosyllabic word in the feedback direction. There are few directly compa- forms, and the majority of them are not representative of rable estimates from other languages. According to Spen- the language. Most are function words, whereas others cer (2007, p. 306), Hanna, Hanna, and Hodges (1966) are recent loans and abbreviations, strongly atypical in reported a feedback consistency of 73% at the phoneme– their syllabic structure. Therefore, no analysis based on grapheme level, corresponding to 50% at the word level. monosyllables only could be expected to yield an outcome However, summation of the most frequent sonograph representative for the orthography of the language as a probabilities for each of the 163 graphemes and for each whole. In English, monosyllables are sufficiently numer- of the 39 phonemes in the “adult 3K” corpus of Spencer ous and diverse to permit a meaningful analysis. However, (2009, supplemental material, Appendix C) would re- it remains to be systematically investigated whether find- sult in consistency estimates of 77.6% and 57.0% for the ings from monosyllables can be generalized, due to pos- feedforward and feedback directions, respectively, at the sible differences from polysyllables (Kessler & Treiman, grapheme–phoneme level. 2001; Lété et al., 2008). In the context of cross-linguistic Ziegler et al. (1996; Ziegler et al., 1997) examined only investigations, with some languages containing predomi- spelling bodies (rimes) of monosyllables and reported nantly multisyllabic words, it is important to extend analy- their bottom-line findings in terms of proportion of words ses to more inclusive and representative samples of the that admit more than a single mapping. They considered vocabulary. the presence of any alternatives as indicative of incon- To overcome the monosyllable restriction and perform sistency, regardless of the relative proportion of alterna- meaningful cross-linguistic comparisons, Borgwaldt et al. tives, and they did not distinguish dominant from other (2004, 2005) analyzed word-initial mappings, using sin- mappings. Therefore, their estimates for French (79.1% gle letters. For comparison, we have calculated entropy for spelling inconsistent and 12.4% reading inconsistent) and both word-initial only and full-word mappings (Table 4). English (72.3% and 30.7%, respectively) may be gross The results show that, in the feedforward (reading) di- overestimates of the overall inconsistency of those ortho- rection, using word-initial single-letter mapping to pho- graphic systems. nemes results in entropy values almost three times less On the basis of the entropy results for word-initial let- than values calculated over whole-word letter-to-phoneme ter type counts, closely matching the methodology of mappings and about twice as large as values calculated Borgwaldt et al. (2004), Greek is about equally ambigu- over whole-word grapheme-to-phoneme mappings. In ous in both the feedforward (first letter % first phoneme, the feedback (spelling) direction, using word-initial map- H  .275) and the feedback (first phoneme % first letter, pings results in entropy values less than half of the values H  .262) direction, being similar to Dutch in reading calculated over whole-word mappings, regardless of the and similar to French in writing (comparing with Figure 1 target units (letters or graphemes). Therefore, using word- in Borgwaldt et al., 2004, p. 175). For reading, Greek is initial letter mappings in Greek results in a gross underes- less transparent than Hungarian and more transparent than timation of orthographic transparency, as compared with French, German, and English. For spelling, Greek is less calculations over letters using the entire words. This may transparent than Hungarian, Dutch, and German and more be attributed to the nonrepresentativeness of word-initial transparent than English. mappings in failing to exhibit the full spectrum of map- Considering entropy calculations for vowels and con- ping complexities that may be encountered in different sonants separately (and comparing with Figure 2 in Borg- parts of the words. waldt et al., 2005, p. 219), for the feedforward (reading) Breaking down the entropy analysis into separate calcu- direction only, Greek is less transparent than Hungarian lations for vowels and consonants, we can compare the ef- and Italian and more transparent than Portuguese, Dutch, fect of restricting our analysis to word-initial letters on the French, German, and English, as far as vowels are con- entropy estimates for each type of phoneme. Focusing on cerned. With respect to consonants, Greek is only more the left side of Table 4 (type counts, to match the method- transparent than French, exceeding in consonant entropy ology of Borgwaldt et al. [2005] as closely as possible), we all these other languages. However, these cross-language see that, in the feedforward direction, the entropy of word- comparisons are valid only to the extent that word-initial initial vowels is somewhat lower than that of word-initial 1002 PROTOPAPAS AND VLAHOU consonants, whereas the reverse is true when full-word ity of verbs (Pinker & Prince, 1988). In contrast, in the letter mappings are considered. In the feedback direction, reading literature, rules tend to be interpreted as account- there are differences in the relative values of vowels and ing for maximum regularity; thus, English regular read- consonants, depending on whether word-initial or full- ings correspond to the most frequent grapheme–phoneme word mappings are taken into account, but the direction of mappings (Coltheart et al., 2001). The former sense is the pairwise comparison remains unaffected. However, the theoretical and can be investigated empirically with novel relationship between vowel and consonant entropy seems or otherwise unmarked stimuli. The latter sense, however, substantially distorted, in both directions, in comparison is distributional and can be examined by reference to the with phoneme/grapheme mappings. corpus data, by comparing the frequency of application Our results indicate that the restriction to word-initial versus nonapplication of the rules in question. mappings may be vulnerable to systematic distortions As is shown in Table A1, all CiV rules, with the notable due to differences in the distribution of phonemes (and exception of Rule 8 involving ,o-, apply less often than letters/ graphemes) across word positions. In Greek, this not. This suggests that the palatal pronunciation may be may be due to the relative proportion of vowels to conso- exceptional and the [F] option regular. The great variability nants being much lower (about half) word initially than it of observed rates of application and the nonconformance is overall (see Table 3, columns “V:C”), perhaps reflecting of ,o- to this pattern limit the confidence of this assertion. the preponderance of CV syllables in the language. Word- Further analyses were undertaken by breaking down the initial mappings will be representative only insofar as the group rules into their constituent sets, by i-grapheme and distribution of syllable types in the language is uniform— by vowel, and further breaking down by stress (stressed vs. that is, syllables with consonantal onsets are as frequent as unstressed V) and by position in the word (initial, medial, syllables without onsets—and to the extent that onset and final syllable). In each of these groupings, the CiV rules coda phonotactics and spelling patterns are similar. apply less often than not in the great majority of cases, In languages with rime-level consistency, such as En- but not in all. Therefore, from an overall frequency per- glish, onset spelling patterns are not representative of all spective, the palatal readings of the CiV clusters appear to spelling patterns. Indeed, some analyses of orthographic constitute the exception rather than the rule. However, the transparency in English have focused on rimes only, ignor- situation is not so clear if we take into account the many ing onsets, because the rime is where most inconsistencies different CiV combinations that are possible. seem to be encountered. In languages with a predominance The letter sets indicated in Table A2 cover all possible of CV syllables, such as Greek, onset spelling patterns will letter combinations, whether or not they are ever encoun- be nonrepresentative in containing a larger proportion of tered in any word spellings. In fact, of the 3,744 individual consonants. If consonant mappings are not as consistent CiV rules that can be obtained by fully expanding the 12 as vowel mappings, consistency estimates based on word- group rules (rows 8, 10, 17, 23, 24, 29, 30, 41, 42, 67, 73, initial mappings only are unlikely to constitute valid in- and 74 in Table A1), only 650 letter combinations (cases) dices for the language. Word-initial mappings may addi- appear in the corpus. Of these, in only 298 cases does the tionally undersample the graphophonemic mapping space corresponding individual CiV rule actually apply at least if certain graphemes or phonemes cannot occur syllable once. The CiV rule applies more often than not in 137 of initially. Our calculations suggest that both of these condi- these 298 cases, but the sum token count of actual rule ap- tions are present in Greek, and thus the validity of using plications in cases in which the CiV rule dominates (i.e., only word-initial mappings is questionable at best. applies more often than not) is approximately equal to the total count in cases in which the rule appears to recede CiV and Regularity (50.2% vs. 49.8%). This happens because the individual Whether or not the origin of the CiV inconsistency can CiV rules dominate in fewer but more frequent cases. be diachronically traced to a distinction between vernacu- Therefore, this corpus analysis cannot provide a clear an- lar and literary vocabulary (Petrounias, 2002), it does not swer to the question of whether the CiV rules correspond help determine what is “regular” and what is “irregular” to the regular case in Greek grapheme-to-phoneme map- behavior. The mapping (C, i, V) may be regular because ping or, indeed, whether a case for a regular/irregular dis- it is more general; or the (Ci, V) mapping may be regular tinction can be made regarding this phenomenon. because in the ordered rule set, it must constitute a higher precedence rule. Further research will be needed to deter- Effects of Boundaries and Context mine whether one of the alternatives can be considered to on Consistency be a rule, to which the other alternative would constitute Analyses of consistency are necessarily limited to the an exception. discrete units used in the computations, be they letters, In the linguistic sense, a rule may correspond to the de- graphemes, or larger units. However, neighboring units fault or unmarked behavior—that is, what happens when may affect the distribution of the mappings. Moreover, nothing special applies. There is no quantitative impli- with the exception of single letters, it is not always clear cation for the regularity corresponding to this rule. So, what constitutes a unit—that is, where the segmentation the regular German concerns a minority of nouns boundaries lie. These issues are not captured by the con- (Marcus, Brinkmann, Clahsen, Wiese, & Pinker, 1995), sistency estimates. However, they may influence the ef- whereas the regular English past tense concerns a major- fects of consistency on reading and spelling performance, ORTHOGRAPHIC TRANSPARENCY IN GREEK 1003 if context is taken into account for processing at the basic Grain Size of Transparency Analysis phoneme–grapheme level (Kessler & Treiman, 2001). Notwithstanding the occasional effects of context, our Mapping consistency is a notion distinct from predict- results suggest that the grapheme seems to be appropriate- ability, because context may determine (or bias) the appro- size unit for analyzing the transparency of Greek orthog- priate mapping among a set of alternatives. For example, raphy. On the one hand, the smaller size letter units are [W] can be spelled with ,ß- (57.1%) or ,À- (42.9%), so its less consistent and, therefore, unable to adequately cap- consistency is only 57.1%. However, the correct spelling ture the systematicity in graphophonemic mappings. On depends on the context: When a voiced consonant follows, the other hand, no larger sublexical units, such as rimes, it is ,À-. Therefore, spelling of [W] is 100% predictable. In seem to be necessary or useful, because resorting to this case, the low consistency of the phoneme is mislead- larger units would increase granularity without improv- ing in that the phonological context provides all neces- ing consistency. Larger unit sizes would not help resolve sary information for spelling. Conversely, the grapheme the predominant source of inconsistency in the feedfor- ,o- is always pronounced [›], so its consistency is 100%. ward direction—namely, the CiV cases—because those However, it is not possible to determine, from the letter se- are determined lexically. The rime–body level of analyz- quence alone (even taking context into account), whether ing English monosyllabic words is not appropriate for the pair of letters ,o- constitute a grapheme or, rather, other languages with many polysyllabic words as well: should be parsed as two graphemes, ,o- and ,-, to be French pronunciation and spelling ambiguities are not pronounced [›] and [F], respectively. In this case, the high reduced when rime–body mappings are used instead of consistency of the grapheme belies the unpredictability of grapheme– phoneme mappings (Lété et al., 2008; Peere- its status as a grapheme. man & Content, n.d.). In the case of mapping phonemes to graphemes, the The calculation of quantitative estimates of ortho- predictability of seemingly inconsistent phonemes can graphic transparency is not an end in itself. It is meaning- be estimated by examining the contexts of each pho- ful to the extent that it can contribute to the generation or neme in which different graphemes appear. There are no testing of specific psycholinguistic hypothesis or to the known constraints at the phonological level to the alter- selection, control, and construction of proper experimen- native spellings of vowels [F], [B], and [L], which are the tal stimuli, for work both within and between languages. major sources of inconsistency. In contrast, the spelling An important issue concerns the psychological reality of of [W] is predictable, as was noted above, as is the spell- different grain sizes across languages. Analyses are often ing of [P] (,Á- at word end, ,À- otherwise). Spellings for performed to minimize ambiguity. However, there is no [S] and [ C] are constrained in that the ,Ú- variants may a priori reason that maximizing systematicity is a valid appear only after ,(-, ,e-,Åor ,h-, although not obligato- objective or that it results in estimates that are relevant rily (cf. HIBCQF%€eŒ­Éh, MPBCQF%µeڌÉh). The palatal for modeling. A more solid foundation can be sought in consonants are also constrained. Spellings of [›], [!], correlations between transparency analyses and perfor- [@], [(] with a single consonant letter appear only before mance on reading tasks. If readers rely on a particular or- a high vowel (@FQ>%€›ŒÉ(, !#/F%OeŒ¼); otherwise, an thographic and phonological grain size when they read (or i-grapheme is added (@>IF%€Ú(Œ , !LKF%O›Œ—). When spell), transparency measures based on the corresponding preceded by a same-voicing consonant, [›]Åand [!] may functional units will correlate more highly with perfor- be spelled with an i-grapheme only (Q#Q!#P%ÉeŒÉ›eÁ, mance on the reading (or spelling) tasks. FP!#P%ŒÀeÁ, S›LP%;›ŒÁ; but cf. #P!#P%eŒ ÀOeÁ, Along these lines, Treiman et al. (1995) showed that S›#P%;oeÁ). Likewise, when preceded by [J], [g] is word naming was affected not only by the consistency of spelled with an i-grapheme only (WFJg>%ßhŽ(Œ ; but individual graphemes, but also by the consistency of reli- there are exceptions: IFJg>%hŽ—(Œ). able units, such as consonantal onsets and orthographic By adding the phoneme tokens for [W], [ƴ ], and [P], for rimes. Performance was not affected by the consistency of which the correct spelling is entirely predictable, to the less reliable combinations, such as the head (initial con- sum of consistent mappings, we can derive an estimate sonant and vowel). Such findings seem to establish the of the lower bound of predictability for Greek feedback psycholinguistic relevance of onset–rime units in English, mapping (spelling) at about 84.3%. Is it useful to take where large ambiguities exist at the phoneme–grapheme context-based predictability into account when consid- level. More recently, Borgwaldt et al. (2005) found that ering effects of consistency on spelling performance? If higher onset letter entropies (i.e., less consistent word- consistency alone determines spelling difficulty, [W], [ƴ ], initial letter–phoneme mappings) were associated with [P], and the palatal consonants should be spelled incor- longer naming latencies in Italian, Dutch, and English, rectly as frequently as the vowels [F], [B], and [L], which three languages differing greatly in orthographic transpar- are comparably inconsistent. Although specific estimates ency, and argued that letters are important functional units of these spelling errors are not yet available for Greek, that should not be ignored in favor of larger grain sizes. the fact that the vowel spellings are frequently studied but To make further progress on this issue, additional research consonant spellings are not suggests that grapheme-level is needed, examining contrasting predictions from graph- consistency does not suffice and context or multiple-size emes and other units of various sizes and word positions, units may have to be taken into account in the quantifica- in both directions, across a variety of orthographies. tion of transparency that is relevant for psycholinguistic A comparison of entropy (or consistency) values cal- investigation. culated over graphemes versus letters (first two rows of 1004 PROTOPAPAS AND VLAHOU

Tables 3 and 4) provides a clue toward resolving the unit tially on the resolution of theoretical matters. For example, issue for Greek. Calculations using single letters result in the notion of regularity, critically hinging on the existence very high estimates of feedforward ambiguity—higher, of rules, is distinct from the notion of consistency, which in fact, than the corresponding estimates for the feedback requires a fixed set of mappings among appropriate-sized (spelling) direction. This contradicts the commonly held units. The DRC simulations of Coltheart et al. (2001) did notion that Greek is consistent for reading and inconsis- not support the hypothesis that consistency may arise tent for spelling. However, intuitive notions may well be as an effect of lexical neighborhoods. They interpreted incorrect. More important, these estimates seem to run consistency effects as caused by the serial application of counter to available cross-linguistic data on the relative overlapping graphophonemic rules, when the correct rule transparency of Greek, such as the findings of Seymour becomes activated after an inappropriate one. In contrast, et al. (2003), according to which Greek is placed near Zevin and Seidenberg (2006) reviewed graded consis- the top of the list of transparent orthographies, on the tency effects in word and nonword reading and attributed basis of the accuracy and speed of reading simple words them to the statistical properties of spelling–sound map- and nonwords by beginning readers (Grade 1). Additional pings. They simulated previous behavioral findings with a evidence consistent with the notion that Greek is feed- parallel distributed model of reading aloud and concluded forward consistent and that the grapheme–phoneme is against graphophonemic mapping rules that are not sensi- the appropriate level of analysis (Goswami, Porpodas, & tive to their probability of application. In Greek, contrast- Wheelwright, 1997; Porpodas, 1999; Porpodas, Pantelis, ing predictions regarding regularity and consistency may & Hantziou, 1990) has been reviewed by Ziegler and be derived from the CiV pattern and the context effects Goswami (2005). A more stringent test for determining on consistency. the appropriate level of analysis can be derived from our The notion of entropy has been introduced as a more entropy data (Table 4). Specifically, in the feedforward comprehensive measure of consistency than is the per- direction, vowels are more ambiguous than consonants at centage of dominant mappings, because it takes into ac- the letter level but less ambiguous at the grapheme level, count the relative proportions of nondominant mappings, and the difference is quite substantial in both cases. Inso- and not as a theoretically distinct construct. It remains far as mapping consistency affects reading efficiency, we to be empirically demonstrated whether entropy values should expect vowels to affect reading performance more are more psycholinguistically relevant than consistency than do consonants, if the input is analyzed at the single percentages. However, it may not be simple to disentan- letter level, or less, if the input is analyzed by graphemes. gle the two if they are too similar to lead to differential Due to the rapid learning of letter combinations, such predictions. For Greek, the correlations between consis- effects may be discernible only at the earliest stages of tency percentages and entropy values are listed in Table 3 learning to read, if at all. (“Entropy–Consistency” columns). The two indices cor- In the preceding discussion we have assumed that there relate very highly, especially in the feedforward direction. is a single most appropriate level of analysis, psychologi- There is some divergence in the feedback direction, sug- cally real and dominating sublexical reading performance. gesting that situations like the [D]/[!] example mentioned This view is consistent with the DRC approach (Coltheart in the introduction may allow differential predictions re- et al., 2001), in which only grapheme–phoneme mappings garding the ease of initial learning to spell. With proper are hypothesized to exist in the nonlexical route, exclud- control of the order and amount of teaching and practice ing larger sized units (see also Coltheart, Curtis, Atkins, & for each mapping, the most appropriate consistency index Haller, 1993, p. 603, for empirical justification). However, may be determined. it remains plausible that more than a single level of cross- On the issue of type versus token counts, our analy- code representation is psychologically real, consistent ses do not seem to offer directions for resolution. Even with the grain-size theory (Ziegler & Goswami, 2005), though there are some differences in the results using the which posits a developmental progression through mul- two kinds of counts, comparisons preserve their direction tiple levels of representation, from coarser to finer, sup- and, usually, their relative proportions, both for entropy ported by the distributional properties of the developing (Table 4) and for consistency percentages (Table 3). In the lexicon, such that denser lexical neighborhoods facilitate feedforward direction, using type or token counts does finer phonological analysis. In a similar vein, Ehri (2005) not seem to affect total entropy or consistency values very describes the development of sight-word vocabulary as a much. In the feedback direction, using type counts leads process of forming connections not only between graph- to higher total entropy estimates on whole-word mappings emes and phonemes, but also between orthographic and but lower on word-initial mappings, as compared with phonological sublexical representations of various sizes. using token counts. The relative difference in entropy val- At the final phase of this progression, readers familiar with ues follows the correlation between type and token counts the alphabetic system of their language retain entire words (listed in Table 3, “Type–Token” column). Using types in memory. However, the importance of larger-sized units instead of tokens has the largest effect for mappings be- may be smaller in more transparent orthographies. tween word-initial phonemes and letters—that is, in the condition chosen by Borgwaldt et al. (2004) for their cal- Comparisons Among Indices and Counts culations. However, as was noted above, this condition Alternative modes of quantifying transparency, based does not seem to be the most appropriate one. The choice on diverging functional hypotheses, may bear substan- between type and token counts, then, remains to be made ORTHOGRAPHIC TRANSPARENCY IN GREEK 1005 on the basis of psycholinguistic evidence and theory of the Coltheart, M., Curtis, B., Atkins, P., & Haller, M. (1993). Mod- sort discussed by Conrad et al. (2008). els of reading aloud: Dual-route and parallel-distributed-processing approaches. Psychological Review, 100, 589-608. doi:10.1037/0033 -295X.100.4.589 Conclusion Coltheart, M., Rastle, K., Perry, C., Langdon, R., & Ziegler, J. In this work, we have provided quantitative indices of (2001). DRC: A dual route cascaded model of visual word recog- orthographic transparency for Greek. We have compared nition and reading aloud. Psychological Review, 108, 204-256. our findings with similar data reported for other orthog- doi:10.1037/0033-295X.108.1.204 Conrad, M., Carreiras, M., & Jacobs, A. M. (2008). Contrasting effects raphies, and we have discussed the limitations and impli- of token and type syllable frequency in lexical decision. Language & cations arising from particular methodological choices Cognitive Processes, 23, 296-326. doi:10.1080/01690960701571570 and shortcuts previously applied. Our results indicate that Ehri, L. C. (2005). Learning to read words: Theory, findings, and restricting the analysis to unrepresentative samples of the issues. Scientific Studies of Reading, 9, 167-188. doi:10.1207/ orthography, such as monosyllabic words or word-initial s1532799xssr0902_4 Goswami, U., Porpodas, C., & Wheelwright, S. (1997). Children’s letters, may distort the outcomes and render cross- linguistic orthographic representations in English and Greek. European Journal comparisons uninterpretable. However, it remains to be es- of Psychology of Education, 3, 273-292. tablished whether meaningful cross-linguistic comparison Grainger, J., & Ziegler, J. C. (2008). Cross-code consistency in a on a common level of analysis is possible. If the statistical functional architecture for word recognition. In E. L. Grigorenko & A. J. Naples (Eds.), Single-word reading: Behavioral and biological properties of each orthography determine the psychologi- perspectives (pp. 129-158). New York: Erlbaum. cally dominant units of processing, transparency analyses Hanna, P. R., Hanna, J. S., & Hodges, R. E. (1966). Phoneme– may help identify that unit but will not be amenable to di- grapheme correspondences as cues to spelling improvement. Wash- rect comparisons. The critical cross-linguistic work, then, ington, DC: U.S. Department of Health, Education, and Welfare. must take place at the theoretical level of representations Hatzigeorgiu, N., Gavrilidou, M., Piperidis, S., Carayannis, G., Papakostopoulou, A., Spiliotopoulou, A., et al. (2000). Design and processes, along the lines of dual-route or distributed and implementation of the online ILSP corpus. In Proceedings of the models of word recognition and reading aloud. Second International Conference of Language Resources and Evalu- ation (LREC) (Vol. 3, pp. 1737-1740). Athens. AUTHOR NOTE Hofmann, M. J., Stenneken, P., Conrad, M., & Jacobs, A. M. (2007). Sublexical frequency measures for orthographic and phonological The quantification of Greek orthographic transparency is part of a units in German. Behavior Research Methods, 39, 620-629. larger ongoing effort at the Institute for Language and Speech Process- Jared, D. (2002). Spelling–sound consistency and regularity effects ing (ILSP) to provide psycholinguistic resources for the Greek language. in word naming. Journal of Memory & Language, 46, 723-750. Data tables, code implementing graphophonemic conversion by rule, doi:10.1006/jmla.2001.2827 and other material may be downloaded from http://speech.ilsp.gr/iplr/. Kessler, B., & Treiman, R. (2001). Relationships between sounds and We thank Aimilios Chalamandaris and the ILSP text-to-speech group for letters in English monosyllables. Journal of Memory & Language, 44, providing the text corpora and reference phonetic transcriptions; Elina 592-617. doi:10.1006/jmla.2000.2745 Nomikou and Stella Drakopoulou for checking CiV words; Aikaterini Kessler, B., Treiman, R., & Mullennix, J. (2008). Feedback- Pantoula for help with the monosyllables; and Efthymia C. Kapnoula consistency effects in single-word reading. In E. L. Grigorenko & for help with the GPC rules. Correspondence regarding this article may A. J. Naples (Eds.), Single-word reading: Behavioral and biological be sent to A. Protopapas, Institute for Language and Speech Process- perspectives (pp. 159-174). New York: Erlbaum. ing, Artemidos 6 & Epidavrou, GR-151 25 Maroussi, Greece (e-mail: Lété, B., Peereman, R., & Fayol, M. (2008). Consistency and word- [email protected]). frequency effects on spelling among first- to fifth-grade French chil- dren: A regression-based study. Journal of Memory & Language, 58, REFERENCES 952-977. doi:10.1016/j.jml.2008.01.001 Marcus, G. F., Brinkmann, U., Clahsen, H., Wiese, R., & Pinker, S. Balota, D. A., Yap, M. Y., & Cortese, M. J. (2006). Visual word recog- (1995). German inflection: The exception that proves the rule. Cogni- nition: The journey from features to meaning (a travel update). In M. J. tive Psychology, 29, 189-256. doi:10.1006/cogp.1995.1015 Traxler & M. A. Gernsbacher (Eds.), Handbook of psycholinguistics McKague, M., Davis, C., Pratt, C., & Johnston, M. B. (2008). The (2nd ed., pp. 285-375). London: Academic Press. doi:10.1016/B978 role of feedback from phonology to orthography in orthographic -012369374-7/50010-9 learning: An extension of item-based accounts. Journal of Research Borgwaldt, S. R., Hellwig, F. M., & De Groot, A. M. B. (2004). in Reading, 31, 55-76. doi:10.1111/j.1467-9817.2007.00361.x Word-initial entropy in five languages: Letter to sound and sound Moscoso del Prado Martín, F., Ernestus, M., & Baayen, R. H. to letter. Written Language & Literacy, 7, 165-184. doi:10.1075/ (2004). Do type and token effects reflect different mechanisms? Con- wll.7.2.03bor nectionist modeling of Dutch past-tense formation and final devoicing. Borgwaldt, S. R., Hellwig, F. M., & De Groot, A. M. B. (2005). Brain & Language, 90, 287-298. doi:10.1016/j.bandl.2003.12.002 Onset entropy matters—Letter-to-phoneme mappings in seven lan- Peereman, R., & Content, A. (n.d.). Quantitative analyses of or- guages. Reading & Writing, 18, 211-229. doi:10.1007/s11145-005 thography to phonology mapping in English and French. Retrieved -3001-9 September 22, 2008, from http://homepages.vub.ac.be/~acontent/ Burani, C., Barca, L., & Ellis, A. W. (2006). Orthographic complexity OPMapping.html. and word naming in Italian: Some words are more transparent than Perry, C., Ziegler, J., & Coltheart, M. (2002). How predictable is others. Psychonomic Bulletin & Review, 13, 346-352. spelling? Developing and testing metrics of phoneme–grapheme con- Burt, J. S., & Blackwell, P. (2008). Sound–spelling consistency in tingency. Quarterly Journal of Experimental Psychology, 55A, 897- adults’ orthographic learning. Journal of Research in Reading, 31, 915. doi:10.1080/02724980143000640 77-96. doi:10.1111/j.1467-9817.2007.00362.x Petrounias, E. V. (2002). Neoellinikí grammatikí kai sigkritikí análisi: Chalamandaris, A., Raptis, S., & Tsiakoulis, P. (2005, September). Tómos A. Fonitikí kai eisagogí sti fonología [ Rule-based grapheme-to-phoneme method for the Greek. In Inter- and comparative analysis: Vol. A. Phonetics and introduction to pho- speech 2005: 9th European conference on speech communication and nology]. Thessaloniki: Ziti. technology (pp. 2937-2940). Lisbon, Portugal. Pinker, S., & Prince, A. (1988). On language and connectionism: Anal- Coltheart, M. (1978). Lexical access in simple reading tasks. In ysis of a parallel distributed processing model of language acquisition. G. Underwood (Ed.), Strategies of information processing (pp. 151- Cognition, 28, 73-193. doi:10.1016/0010-0277(88)90032-7 216). London: Academic Press. Porpodas, C. D. (1999). Patterns of phonological and memory process- 1006 PROTOPAPAS AND VLAHOU

ing in beginning readers and spellers of Greek. Journal of Learning tion, use, and acquisition of English orthography. Journal of Ex- Disabilities, 32, 406-416. doi:10.1177/002221949903200506 perimental Psychology: General, 124, 107-136. doi:10.1037/0096 Porpodas, C. D. (2006). Literacy acquisition in Greek: Research review -3445.124.2.107 of the role of phonological and cognitive factors. In R. M. Joshi & Zevin, J. D., & Seidenberg, M. S. (2006). Simulating consistency ef- P. G. Aaron (Eds.), Handbook of orthography and literacy (pp. 189- fects and individual differences in nonword naming: A comparison 199). Mahwah, NJ: Erlbaum. of current models. Journal of Memory & Language, 54, 145-160. Porpodas, C. D., Pantelis, S. N., & Hantziou, E. (1990). Phonologi- doi:10.1016/j.jml.2005.08.002 cal and lexical encoding processes in beginning readers: Effects of age Ziegler, J. C., & Goswami, U. (2005). Reading acquisition, devel- and word characteristics. Reading & Writing, 2, 197-208. doi:10.1007/ opmental dyslexia, and skilled reading across languages: A psy- BF00257971 cholinguistic grain size theory. Psychological Bulletin, 131, 3-29. Seymour, P. H. K., Aro, M., & Erskine, J. M. (2003). Foundation lit- doi:10.1037/0033-2909.131.1.3 eracy acquisition in European orthographies. British Journal of Psy- Ziegler, J. C., Jacobs, A. M., & Stone, G. O. (1996). Statistical analysis chology, 94, 143-174. doi:10.1348/000712603321661859 of the bidirectional inconsistency of spelling and sound in French. Be- Shannon, C. E. (1948). A mathematical theory of communication. Bell havior Research Methods, Instruments, & Computers, 28, 504-515. System Technical Journal, 27, 379-423 and 623-656. Ziegler, J. C., Perry, C., & Coltheart, M. (2000). The DRC model Shannon, C. E. (1950). Prediction and entropy of printed English. Bell of visual word recognition and reading aloud: An extension to Ger- System Technical Journal, 30, 50-64. man. European Journal of Cognitive Psychology, 12, 413-430. doi:10 Spencer, K. A. (2007). Predicting children’s word-spelling difficulty for .1080/09541440050114570 common English words from measures of orthographic transparency, Ziegler, J. C., Perry, C., & Coltheart, M. (2003). Speed of lexical phonemic and graphemic length and word frequency. British Journal and nonlexical processing in French: The case of the regularity effect. of Psychology, 98, 305-338. doi:10.1348/000712606X123002 Psychonomic Bulletin & Review, 10, 947-953. Spencer, K. A. (2009). Feedforward, -backward, and neutral transpar- Ziegler, J. C., Stone, G. O., & Jacobs, A. M. (1997). What is the ency measures for British English. Behavior Research Methods, 41, pronunciation for -ough and the spelling for /R/? A database for com- 220-227. doi:10.3758/BRM.41.1.220 puting feedforward and feedback consistency in English. Behavior Spencer, K. A. (in press). Predicting children’s word-reading difficulty Research Methods, Instruments, & Computers, 29, 600-618. for common English words: The effect of complexity and transpar- ency. British Journal of Psychology. NOTES Stathis, C., & Carayannis, G. (1999). Emploutismós morfologikón lexikón me órous kai ipostíriksi kiménon entáseos óron se diadikasíes 1. A grapheme is the written representation of one phoneme—that is, a diórthosis lathón [Enriching morphological dictionaries with terms letter or group of letters that correspond to a single phoneme (Coltheart, and supporting term-intensive texts in error correction procedures]. 1978; Coltheart, Rastle, Perry, Langdon, & Ziegler, 2001). In Praktika 2ou sinedriou “Elliniki Glossa kai Orologia” (pp. 157- 2. Greek letters and graphemes in the text are shown within angle 165). Athens. , to avoid confusion. Phonetic symbols refer to broadly tran- Treiman, R., Mullennix, J., Bijeljac-Babic, R., & Richmond- scribed phonetic realizations, not to any presumed underlying phono- Welty, E. D. (1995). The special role of rimes in the descrip- logical representations, and are shown within square brackets. ORTHOGRAPHIC TRANSPARENCY IN GREEK 1007

APPENDIX Graphophonemic Conversion Rules Table A1 Graphophonemic Conversion Rules, in Order of Rank Priority No. Pre GRA Fol PHO Opt Reg No. Pre GRA Fol PHO Opt Reg 1 (Å # 1.00 41 ——UÅ V g 9 .19 2 (Œ # 1.00 42 —UV g 9 .48 3 (Šڌ > 1.00 43 ——Å K .95 4 (Œ > 1.00 44 —Å P  1.00 5 (Å > .99 45 —Å K .99 6 ;;Å S 1.00 46 ÝÅ HP 1.00 7 ;Å S 1.00 47 ›ÚÅ R 1.00 8 oUÅ V › 9 .65 48 ›ÚŒ R 1.00 9 oÅ H › .92 49 ®®Å M 1.00 10 oGUÅ V ( 9 .11 50 ®Å M .96 11 oGÅ H ( 1.00 51 ¼¼Å / 1.00 12 oÅ K  1.00 52 ¼Å / 1.00 13 oGÅ D .99 53 ÉFÅ ƴ 1.00 14 oÅ – 1.00 54 À L W .98 15 YYÅ " — 55 ÀÀ P 1.00 16 YÅ " 1.00 56 ÀÅ P 1.00 17 Ž UV g 9 .13 57 ÁÅ P 1.00 18 eŠڌ # 1.00 58 ÉÉÅ Q 1.00 19 hŠڌ F 1.00 59 ÉÅ Q .94 20 ÌÅ 2 1.00 60 A Y;Å S 1.00 21 [ F 1.00 61 A Y­Å C 1.00 22 [Œ F 1.00 62 A YÅ N S 1.00 23 €€UÅ V @ 9 .00 63 A YÅ C 1.00 24 €UÅ V @ 9 .34 64  Ú F 1.00 25 €€Å H @ 1.00 65 Ú[ Œ F 1.00 26 €Å H @ .97 66 ­ C 1.00 27 €€Å H 1.00 67 OUÅ V ! 9 .17 28 €Å H 1.00 68 OÅ H ! .99 29 UÅ V ª 9 .43 69 OÅ U 1.00 30 UÅ V ª 9 .45 70 µÅ MP 1.00 31 Å I .99 71 ™Å L .97 32 Å I .98 72 ™Œ L 1.00 33 Ž®Å É J 1.00 73 B UÅ V ! 9 .26 34 Ž®Å ? 1.00 74 W UÅ V › 9 .23 35 ŽŽÅ J 1.00 75 UÅ F .95 36 ŽÅ J 1.00 76 T F 1.00 37 —Å ÉZ K .99 77 eÅ # .96 38 ÉßÅ AW 1.00 78 eŒ # 1.00 39 ßÅ W 1.00 79 ›Å L .99 40 —ÉÅ A 1.00 80 ›Œ L 1.00 Note—Pre, preceding context; GRA, grapheme; Fol, following context; PHO, phoneme; Opt, optional; Reg, regularity (i.e., proportion of rule-matching opportunities, taking rank order into account, in which the rule produces the correct pronunciation). Capital Latin letters refer to sets of letters (listed in Table A2) for which the rule applies.

(Continued on next page) 1008 PROTOPAPAS AND VLAHOU

APPENDIX (Continued) Table A2 Sets of Letters Defining Context or Graphemes for the Application of Graphophonemic Conversion Rules (Table A1) Symbol Letters in Set Description C ;,Åo,ÅY,Åß,ÅÌ,ŀ,Ł,Ŏ,ŗ,ÅÝ,Å®,ż,ÅÀ,ÅÁ,ÅÉ,Å­,ÅO,Å Consonants ŵ,Åo€,Ŏ®,ŗÉ,ÅÉß V e,ÅeŒ,ś,śŒ,ÅÚ,ÅÚŒ,Å(,Å(Œ,śÚ,śڌ,Å(,Å(Œ ,Åe,ÅeŒ,Åh, Vowels hŒ,Å,ÅŒ,ś,śŒ,ÅÚ,Åڌ,ř,řŒ S Ì,ŀ,ÅÝ,Å®,ÅÉ,ÅÀ,ÅÁ,Å­,ÅO,ŵŠUnvoiced consonants (UCs) L ;,Åo,ÅY,Åß,Ł,Ŏ,ŗ,ż,Åo€,Ŏ®,ŗÉ,ÅÉßÅ Voiced consonants (VCs) I e,ÅeŒ,ś,śŒ,ÅÚ,ÅÚŒ,Åh,ÅhŒ,Å,ÅŒ,ÅÚ,ÅڌŠVowels [F] E e,ÅeŒ,Å(,Å(ŒÅ Vowels [#] U e,ś,ÅÚ,Åh,Å,ÅÚÅ Unstressed vowels [F] T eŒ,śŒ,ÅÚŒ,ÅhŒ,ÅŒ,ÅڌŠStressed vowels [F] H e,ÅeŒ,ś,śŒ,Å(,Å(Œ,ÅÚ,ÅÚŒ,Åe,ÅeŒ,Åh,ÅhŒ,Å,ÅŒ,ÅÚ,Åڌ Front vowels ([F], [#]) A (,Åe,ÅhÅ Vowels combining with ,Ú- G o,ŀŠConsonants forming [D] Y Ú,Åڌ Vowels ,Ú- Z À,ÅÁ,ÅßÅ Sibilants B Ì,ÅÝ,Å®,ÅÉ,ÅÀ,Å­,ŵŠUCs with no palatal variant W ;,ÅY,Åß,ż,Ŏ®,ŗÉ,ÅÉßÅ VCs with no palatal variant N ;,Åo,ÅY,Åß,Ł,Ŏ,ŗ,ż,Åo€,Ŏ®,ŗÉ,ÅÉß,Å(,Å(Œ ,Åe,ÅeŒ,Å Vowels and VCs Åh,ÅhŒ,Å,ÅŒ,ś,śŒ,ÅÚ,Åڌ,ř,řŒ,Åe,ÅeŒ,ś,śŒ,ÅÚ,ÅÚŒ, Å(,Å(Œ,śÚ,śڌ P €,Åo,ÅO,ÅÝÅ Velar consonants K O,ÅÝ,ŀÉÅ Velar consonants not forming [D] F À,ÅÁÅ Consonants [P]

(Manuscript received September 23, 2008; revision accepted for publication May 7, 2009.)