Orthographic Neighbourhood in Czech Language and Its Use in Research of Reading Impairment Original Study
Total Page:16
File Type:pdf, Size:1020Kb
Linguistic Frontiers • 3(1) • 54–59 • 2020 DOI: 10.2478/lf-2020-0007 Linguistic Frontiers Orthographic neighbourhood in Czech language and its use in research of reading impairment Original Study Monika Ptáčková, Kateřina Vitásková Palacký University Olomouc Received: June 2020 ; Accepted: June 2020 Abstract : Reading is a complex function that affects our daily living. It requires the cooperation of many structu- res and functions and is influenced by many different factors. One of the factors influencing the word reading process may be the characteristics of the word, such as its frequency of use in the language, its length, ortho- graphic neighbourhood, and others. These characteristics can affect the speed and accuracy of reading in re- aders with and without learning disability. The following paper presents some aspects of our current research focused on the influence of the properties of the word on their reading in the Czech language environment. The concept of orthographic neighbourhood will be explained in more detail including some specifics in its calcu- lation and use in the Czech language. Keywords : reading, orthographic neighbourhood, Damerau-Levenshtein distance, visual word recognition INTRODUCTION Reading is a complex process consisting of many impor- on visual word recognition and reading in people with tant components, such as visual recognition, phonolo- dyslexia (see Davies, 2007) and in the typical population gical and orthographic processing, and others (Grainger (see Grainger 2017). Due to the variance between lan- 2017). There is a variety of different impairments of guages, the results of research studies differ as well1. reading ability, which can be generally divided into two Although in the last ten years many kinds of research in main groups—developmental reading impairments (see various language environments have been performed, e.g. Penolazzi et al. 2009) and acquired reading impair- most experiments are still based on the English language ments (see e.g. Karanth 2003). In the present research, and English speakers (Share 2008). the authors focus mainly on the developmental reading The important word characteristics include for exam- disorders, in some classifications also called dyslexia ple word length, word frequency (frequency of using (see e.g. Hulme, Snowling 2016). Dyslexia research in- the word), and orthographic neighbourhood (ON). The volves a lot of interesting aspects including linguistic frequency effect generally means that reading of high- characteristics of words. It suggests how the reading -frequency words is usually faster and more correct than process is involved in some of these word characteris- low-frequency words (Ghyselink et al. 2004). This is noti- tics in people with or without reading impairment. It can ceable in many languages, such as English (Grainger 2017), then help us understand reading disorders better in the Spanish (González-Nosti et al. 2013), etc. Ghyselinck et whole context of language acquisition. al. (2004) also mentioned “the cumulative frequency hy- In foreign literature, there is plenty of research on pothesis“, which combines the effect of frequency and this topic focusing on the effect of word characteristics age of word acquisition. Many studies focus on other 1 For example, in Spanish (Gonzáles-Nosti et al. 2014) the effect of orthographic neighbourhood density was shown only in short words, and there was stronger length effect than in English. Open Access. Open Access. © 2020 Monika Ptáčková, Kateřina Vitásková, published by Sciendo. This work is licensed under the Creative Commons BY 4.0 license Orthographic neighbourhood in Czech language and its use in research of reading impairment variables related to word frequency, such as the age of literature—either an inhibitory effect, no effect, or a facili- acquisition or frequency trajectory 2 (Juhasz et al. 2019). tatory effect on reading and visual word recognition. For Information about word frequency can be found in lists example, Weekes et al. (2006) suggests a hypothesis that based on the national corpus. As far as the Czech lan- the ON effect changes during development. A research guage in concerned, the Czech National Corpus website study by Andrews and Hersch (2010) suggests a conclu- also includes the frequency list 3. sion that in individuals with worse spelling skills, there is As for the word length effect, it also represents a higher facilitatory effect of the orthographic neighbou- a broad topic which has been in the centre of attention rhood, while in individuals with good spelling skills the of many studies. Word length usually affects the speed effect is generally inhibitory. The question is how language and accuracy of its recognition. Word length can also affects this variable. Furthermore, properties such as word change the influence of other properties, for example in length, frequency, and orthographic neighbourhood are a Spanish research study (Gonzales-Nosti et al. 2014), relatively closely related because short words are generally the effect of orthographic neighbourhood was significant more frequent and have more orthographic neighbours only in a group of short words. Schroeter and Schroe- than long words (largely addressed by Gonzales-Nosti et der (2014) point out the fact that the word length effect al. (2014) in their research). decreases during life with the strongest word length Many recent research studies suggest another appro- effect being usually in childhood. ach to the “orthographic neighbourhood density” value, The orthographic neighbourhood effect is the focus which is The Damerau-Levenshtein distance. As reported of the present research and is not frequently mentioned by Yarconi et al. (2008), it can be more exact than the in Czech scientific literature. The orthographic neighbou- traditionally used Coltheart’s N. The Damerau-Leven- rhood may be explained as the quality of a word, which shtein distance (DL) also reports about word similarity describes the number of words that can be created from compared with other words in the corpus. In this way, the word by changing a single letter (Grainger et al. 2005). the word is compared with all words, not only words This means for example that the Czech word “klíč” has of the same length. This value reflects the minimum 3 orthographic neighbours (“kleč”, “klín” and “klít”), while number of operations needed to change the word to the word “hmyz” has no orthographic neighbours be- another one. An operation is considered substitution, cause no existing Czech word is generated by changing addition, deletion, or transposition of one letter (two one letter to another in this letter string. Some authors letters in the case of transposition). The resulting value use the term “orthographic neighbourhood density” (La- is the sum of the 20 lowest values. In other words, it su- xon et al. 2002) or “Coltheart’s N” (Yarkoni et al. 2008) ggests how many operations are needed to transform which refers to the same concept. the word into the twenty closest words. This means that, Analogically, phonological neighbourhood refers to for example, in the Czech language the word “soused”, how many words can be made by changing one phoneme which has 0 orthographic neighbours according to the to another (Clarkson et al. 2017). This is more important Coltheart’s N (no other word arises by changing one in languages with non-transparent orthography such letter), has a DL value of 32, which is quite low as in 20 as English or French, while in the Czech language (with proximate words there is only 1 or 2 operations needed transparent orthography) similar to German, in most ca- to change to the word “soused”. ses the orthographic and phonological neighbourhood The present research uses both the Coltheart’s N and density is the same (Marian et al. 2012). LD distance, so the influence of both values and their The impact of the orthographic neighbourhood on rea- differences can be compared. ding is examined by means of various types of tasks. For example, as used in the present research, a lexical decision task can be used in which the participant has to decide EXPERIMENTAL PROCEDURES as soon as possible whether the presented letter string is The aim of this paper is to introduce a tool which the an existing word or a pseudoword. In this type of experi- authors developed for counting the Coltheart’s N and DL ment, reaction times and error rates are reported. Another distance in the Czech language and propose its applica- approach is the use of technology-based methods, such tion in ongoing research. The research aims to answer as data collection from eye-tracking or fMRI during word whether or not the Coltheart’s N and Damerau-Leven- reading (for more information see for example Schloss shtein distance can be used in reading research in the 2017). This method provides more comprehensive infor- Czech language and what the specifics of its use are in mation, but at the same time the procedure is difficult and the Czech orthography system. sometimes it is not possible to examine sufficient number Foreign literature includes many tools for ON calcula- of participants and collect sufficient data. tion such as N-watch, which is a program that works with There are various findings concerning the ortho- any language database (corpus). Using this programme, graphic neighbourhood effect on reading in foreign it is possible to identify the same characteristics as in the 2 Frequency trajectory expresses how common is the word in childhood and adulthood. 3 Available at: <https://wiki.korpus.cz/doku.php/seznamy:srovnavaci_seznamy> 55 Ptáčková, Vitásková case of the Coltheart’s N only by insertion of source data In[4]:= DamerauLevenshteinDistance ["máma", "máslo"] (national corpus) and the list of words one is interested Out[4]= 3 in (Davis 2005). Unfortunately, this tool does not work witch Czech diacritics— ´ and ˇ, so it is not practicable Figure 1: Example of WolframMathematica using the for the purposes of the present research.