
The grapho-phonological system of written French: Statistical analysis and empirical validation Marielle Lange Alain Content Laboratory of Experimental Psychology, Laboratory of Experimental Psychology, Universit6 Libre de BruxeUes Universit6 Libre de Bruxelles Av. F.D. Roosevelt, 50 Av. F.D. Roosevelt, 50 Bruxelles, Belgium, B 1050 Bruxelles Bruxelles, Belgium, B 1050 Bruxelles [email protected] [email protected] Abstract properties (Cf. PH in PHASE vs. SHEPHERD). One additional difficulty stems from the fact that The processes through which readers evoke the graphemes, the orthographic counterparts of mental representations of phonological forms phonemes, can consist either of single letters or of from print constitute a hotly debated and letter groups, as the previous examples illustrate. controversial issue in current psycholinguistics. In Psycholinguistic theories of visual word this paper we present a computational analysis of recognition have taken the quasi-systematicity of the grapho-phonological system of written writing into account in two opposite ways. In one French, and an empirical validation of some of the framework, generally known as dual-route obtained descriptive statistics. The results provide theories (e.g. Coltheart, 1978; Coltheart, Curtis, direct evidence demonstrating that both grapheme Atkins, &Haller, 1993), it is assumed that frequency and grapheme entropy influence dominant mapping regularities are abstracted to performance on pseudoword naming. We discuss derive a tabulation of grapheme-phoneme the implications of those findings for current correspondence rules, which may then be looked models of phonological coding in visual word up to derive a pronunciation for any letter string. recognition. Because the rule table only captures the dominant regularities, it needs to be complemented by Introduction lexical knowledge to handle deviations and One central characteristic of alphabetic writing ambiguities (i.e., CHAOS, SHEPHERD). The systems is the existence of a direct mapping opposite view, based on the parallel distributed between letters or letter groups and phonemes. In processing framework, assumes that the whole set most languages, although to a varying extent, the of grapho-phonological regularities is captured mapping from print to sound can be characterized through differentially weighted associations as quasi-systematic (Plaut, McClelland, between letter coding and phoneme coding units Seidenberg, & Patterson, 1996; Chater & of varying sizes (Seidenberg & McClelland, 1989; Christiansen, 1998). Thus, descriptively, in Plaut, Seidenberg, McClelland & Patterson, addition to a large body of regularities (e.g. the 1996). grapheme CH in French regularly maps onto/~/), These opposing theories have nourished an one generally observes isolated deviations (e.g. ongoing complex empirical debate for a number of years. This controversy constitutes one instance CH in CHAOS maps onto /k/)as well as of a more general issue in cognitive science, ambiguities. In some cases but not always, these which bears upon the proper explanation of rule- difficulties can be alleviated by considering higher like behavior. Is the language user's capacity to order regularities such as local orthographic exploit print-sound regularities, for instance to environment (e.g., C maps onto /k/ or/s/ as a generate a plausible pronunciation for a new, function of the following letter), phonotactic and unfamiliar string of letters, best explained by phonological constraints as well as morphological knowledge of abstract all-or-none rules, or of the 436 statistical structure of the language? We believe may consist of several letters, the segmentation of that, in the field of visual word processing, the letter strings into graphemic units is a non-trivial lack of precise quantitative descriptions of the operation. A semi-automatic procedure similar to mapping system is one factor that has impeded the rule-learning algorithm developed by resolution of these issues. Coltheart et al. (1993) was used to parse words In this paper, we present a descriptive analysis of into graphemes. the grapheme-phoneme mapping system of the First, grapheme-phoneme associations are French orthography, and we further explore the tabulated for all trivial cases, that is, words which sensitivity of adult human readers to some have exactly the same number of graphemes and characteristics of this mapping. The results phonemes (i.e. PAR,/paR/). Then a segmentation indicate that human naming performance is algorithm is applied to the remaining unparsed influenced by the frequency of graphemic units in words in successive passes. The aim is to select the language and by the predictability of their words for which the addition of a single new GPA mapping to phonemes. We argue that these results would resolve the parsing. After each pass, the implicate the availability of graded knowledge of new hypothesized associations are manually grapheme-phoneme mappings and hence, that checked before inclusion in the GPA table. they are more consistent with a parallel distributed The segmentation algorithm proceeds as follows. approach than with the abstract rules hypothesis. Each unparsed word in the corpus is scanned from left to right, starting with larger letter groups, in . Statistical analysis of grapho- order to find a parsing based on tabulated GPAs phonological correspondences of which satisfies the phonology. If this fails, a new French GPA will be hypothesized if there is only one unassigned letter group and one unassigned 1.1. Method phoneme and their positions match. For instance, Tables of grapheme-phoneme associations the single-letter grapheme-phoneme associations (henceforth, GPA) were derived from a corpus of tabulated at the initial stage would be used to 18.510 French one-to-three-syllable words from mark the P-/p/and R-/R/correspondences in the the BRULEX Database (Content, Mousty, & word POUR (/puRl) and isolate OU-/u/as a new Radeau, 1990), which contains orthographic and plausible association. phonological forms as well as word frequency When all words were parsed into graphemes, a statistics. As noted above, given that graphemes 80 Grapheme-Phoneme 70 Grapheme Entropy (H) 70 Association Probability 60 60 ! Most unpredictablegraphemes 50 50 ! (H • .90) Vowels: e, oe, u, ay, eu, 'i 40 40 Consonants: x, s, t, g, II, c 30 3o 20 20 10 10 0 o o ~ d o d o d o o d ........ Figure 1. Distribution of Grapheme-Phoneme Association Figure2. Dis~ibutionof~aphemeEn~y(H) values, probablity, based on type measures. b~on~eme~rcs. 437 Predictibility of Grapheme-Phoneme Associations in French Numberof GPA probability GPA probability H (type) H (token) pmnunci=ions (type) (token) M SD M SD M SD M SD M SD All 1.70 (1.26) .60 (.42) .60 (.43) .27 (.45) .23 (.42) Vowels 1.66 (1.12) .60 (.41) .60 (.44) .29 (.48) .21 (.41) Consonants 1.76 (1.23) .60 (.42) .60 (.42) .25 (.42) .26 (.44) Table I. Number of different pronunciationsof a grapheme, grapheme-phonemeassociation (GPA) probability, and entropy (H) values, by type and by token, for Frenchpolysyllabic words. final pass through the whole corpus computed in the transcoding from sound to spelling, the grapheme-phoneme association frequencies, based French orthography is generally claimed to be both on a type count (the number of words very systematic in the reverse conversion of containing a given GPA) and a token count (the spelling to sound. The latter claim is confirmed by number of words weighted by word frequency). the present analysis. The grapheme-phoneme Several statistics were then extracted to provide a associations system of French is globally quite quantitative description of the grapheme-phoneme predictable. The GPA table includes 103 system of French. (1) Grapheme frequency, the graphemes and 172 associations, and the mean number of occurrences of the grapheme in the association probability is relatively high (i.e., corpus, independently of its phonological value. 0.60). Furthermore, a look at the distribution of (2) Number of alternative pronunciations for each grapheme-phoneme association probabilities grapheme. (3) Grapheme entropy as measured by (Figure 1) reveals that more than 40% of the H, the information statistic proposed by Shannon associations are completely regular and (1948) and previously used by Treiman, unambiguous. When multiple pronunciations exist Mullennix, Bijeljac-Babic, & Richmond-Welty (on average, 1.70 pronunciations for a grapheme), (1995). This measure is based on the probability the alternative pronunciations are generally distribution of the phoneme set for a given characterized by low GPA probability values (i.e., grapheme and reflects the degree of predictability below 0.15). of its pronunciation. H is minimal and equals 0 The predictability of GPAs is confirmed by a very when a grapheme is invariably associated to one low mean entropy value. The mean entropy value phoneme (as for J and/3/)- H is maximal and for all graphemes is 0.27. As a comparison point, equals logs n when there is total uncertainty. In if each grapheme in the set was associated with this particular case, n would correspond to the two phonemes with probabilities of 0.95 and 0.05, total number of phonemes in the language (thus, the mean H value would be 0.29. There is no since there are 46 phonemes, max H = 5.52). (4) notable difference between vowel and consonant Grapheme-phoneme association probability, predictability. Finally, it is worth noting that in which is the GPA frequency divided by the total
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages7 Page
-
File Size-