Rika Plat & & Kees de Boot

The Colour of Noise: Dynamic Variability Patterns of L2 Lexical Processing

Series A: General & Theoretical Papers ISSN 1435-6473 Essen: LAUD 2012 Paper No.A775

Universität Duisburg-Essen

Rika Plat & Wander Lowie & Kees de Boot

University of Groningen, Germany

The Colour of Noise: Dynamic Variability Patterns of L2 Lexical Processing

Copyright by the author Reproduced by LAUD 2012 Linguistic Agency Series A University of Duisburg-Essen General and Theoretical FB Geisteswissenschaften Paper No.775 Universitätsstr. 12 D- 45117 Essen

Order LAUD-papers online: http://www.linse.uni-due.de/linse/laud/index.html Or contact: [email protected]

ii

Rika Plat, Wander Lowie &

The Colour of Noise: Dynamic Variability Patterns of L2 Lexical Processing

Abstract Lexical knowledge is not stable and unchanging within an individual; instead, it is constantly influenced by experiences and context dependent language use. Therefore, lexical knowledge should be looked at as inseparable from the time and context in which it is used, and is highly variable. Consequently, the variability in language production will have to be regarded as a source of information rather than as meaningless noise. In this paper, we report on a study in which one single participant took part in a word naming experiment in his mother tongue (L1) and his second language (L2) for a period of two years. The lexical processing data resulting from this experiment are explored using linear and non-linear statistical methods to make sense of the variability in L1 and L2 language production on a variety of time scales.

The Dynamics of Lexical Knowledge L2 acquisition is a variable and dynamic process, during which the linguistic knowledge system is constantly changing and reorganizing, resulting in newly added knowledge to the system, but also loss of information. However, the amount of variability in the L2 acquisition process is still often underestimated and disregarded. The bulk of research on L2 acquisition seems to assume that language acquisition is linear and that in the course of time only new knowledge is gained; however, as has now repeatedly been demonstrated, language acquisition is by its nature constantly open to outside influence and thus constantly affected by it, in a nonlinear way (see, for instance, Lowie & Verspoor, 2011). Studies of language acquisition that look at means over a large group of learners offer a lot of useful information, but a closer look at individual variation shows that a great deal of valuable information concerning actual, real time development is also filtered out and overlooked in group studies. L2 acquisition is a very individual process, since it depends on a multitude of dynamically interacting factors leading to constant variability. This variability arises from both a constantly changing language environment and from self-organisation within the system. The dynamic interactions are not confined to the internal system; the external influence and thus the context in which a language is learned and used, can in itself also be seen as a dynamic system that is constantly changing. In this view, context can never be reduced to a mere backdrop against which language is learned; it is in constant interaction with the language learner and vice versa, and can impossibly be separated from the learner.

1

The context in which language develops includes many components that will have different effects on different learners. There is for instance the cultural context that includes the role of student and teacher in a particular cultural environment. There is the social context, including the relationship with the teacher and other learners, and the educational context, including what materials are used. Every individual also has a different starting point due to a unique history, experiences, intelligence etc. All of these contextual factors, and many more not mentioned here, determine the developmental path of language acquisition. This interaction between language user and context works both ways; a language user will adapt to the contextual conditions, and the environment will adapt in response to the language user’s actions (Larsen-Freeman, 2008). Since context and language-user thus co-evolve, it would be untenable to separate them and try to explain language learning as if it took place independent from context. Apart from the context that causes variation among individual language learner’s developmental paths, every learner also has internal variability inherent to the developmental process. In a self-organising system, variability is necessary for the system to develop. An increased amount of variability is in fact often a precursor to a jump in development to a higher level of performance. This was also found in the L2 writing performance of an advanced student of English whose writing over the course of three years was analysed on sentence complexity and vocabulary use. Looking at measurements for average word length showed a relatively stable period that was followed by a period showing many fluctuations in performance (Verspoor, Lowie & van Dijk, 2008). After this period of variability, performance stabilizes again on a higher level of performance. The high amount of variability therefore seems a consequence of a necessary re-organisation of the system in order to enable a big step in development. That active second language learning means constant change may not sound very surprising, since a language learner is by definition adding new words and acquiring new grammatical patterns all the time. However, the idea that even one’s L1 is in constant flux is often implicitly denied in the way language research is being conducted. When measuring L2 performance, L2 speakers are almost always compared to a control group of L1 speakers, the assumption being that the performance of L1 speakers provides a static baseline. Even though the L1 is usually quite entrenched in a speaker’s mind, it is still developing over time. A bulk of attrition research proves that the L1 is not immune to loss when it is not being used over an extensive period of time. The area most susceptible to loss in the L1 is lexical access; grammatical knowledge seems to be quite stable; since even when L1 attrition is quite severe, people long retain a grammatical knowledge surpassing all but the most advanced L2 speakers (Schmid 2010). However, it is possible to lose the L1 completely, as shown by Palier et al. (2003), who tested Korean children that had been adopted in France. Some of his subjects had used their native language for as much as 8 years; however, when presented with Korean words and sentences when the subjects were

2 in their 20s, they could not distinguish these any better than a group of French control subjects could. Also without a drastic change in circumstances such as migration or adoption, the L1 shows a great deal of variability between individuals. Sparks & Ganschow (1993) found that poor performance in learning an L2 could be led back to native language problems. Otherwise successful students who were not successful in learning a second language often had limited linguistic coding skills in their L1. Based on this finding, they formulated their Linguistic Coding Differences Hypothesis (LCDH), which states that difficulties with the rule system of the first language correspond directly to related problems in learning a second language. Looking at language development closely reveals a lot of variation between ultimate level of attainment of the L1 and the L2.

Storage of Lexical Knowledge The amount of variability in language use between and within individual learners has implications for how lexical knowledge may be organised in the brain, which in turn should influence the ways linguists devise their experiments in order to get a glimpse of what this organisation might be like. This may seem like a self-evident statement; however, really accepting that language is a complex system should entail completely rethinking some of the experimental methods that have become widely accepted in linguistic research. Even though more and more linguists assume language to be organised as a structured network in the brain that is continuously developing, many of these linguists still use ‘old-school’ experimental methods that seem to suppose the lexicon a mere list of words not unlike a dictionary, and representations to be fixed and stable, even across groups of people. An example of the first presumption can still be found in a lot of attrition studies, where the level of attrition is usually determined by administering translation tasks; the number of words a subject does not remember is taken to correspond directly to the percentage of vocabulary the subject has supposedly lost (Meara, 2002). Also, the often used paradigm of lexical priming implicitly denies the variability of lexical representations, in assuming the use of a prime has a similar and fixed effect on reaction times across many individuals. One way of theorising about how language is organised in the brain that is consistent with language as a complex dynamic system is connectionism. Connectionism seeks to explain cognitive processes by using computer simulations of neural networks. Recent work in this area in trying to incorporate a more dynamic view of the lexicon in attrition research has been conducted by Paul Meara (2004). Meara moves away from the strong focus on the individual word or lexical entry, and uses a Boolean network to find out what the implications for attrition research are when thinking of the lexicon as a structured network. Assuming, as Meara does, that words are connected, and that the activation of one lexical item influences the activation level of other lexical items it is connected with, attrition is by no means a simple and linear process in which words get removed from the lexicon one by one. His computer simulation of a simplified lexicon shows that enough attrition events can

3 cause a ‘ripple’ throughout the system; once enough words are deactivated, entire parts of the network can become deactivated. The loss of every particular lexical item, though not necessarily causing one such ‘ripple’, will thus weaken the structure of the network. Also, every one of his simulations takes a different course, which could suggest attrition to be a process different for every individual. The above mentioned results all pertain to simulations of a single language lexicon. Meara (2006) has, however, also tried to test the difference in stability of a lexicon that contains an L1 and an L2. Another Boolean network consisting of 1,000 L1 and 1,000 L2 words where only 2 percent of the words are ‘entangled’ (linked to and thus receiving activation from both an L1 and an L2 word) quickly settles into an state: two thirds of the L1 words being active versus only 10 percent of the L2 words being active. ‘Forcing’ the L2 to become more active by activating 15 words at each iteration of the model leads to activation of almost the entire L2 lexicon. Interestingly, this automatically suppresses activation in the L1, without a inhibitory mechanism having been placed in the system. After ‘forcing’ of the L2 stops, the system returns to its relatively stable attractor state, the L1 again becoming the dominant language. The ‘forcing’ of the system could be seen as usage of the L2, and the interesting feature of networks that Meara uncovers here, is that even though a system may not be explicitly programmed to do so, the activation of one subset can inhibit the activation of another one. Meara’s work discussed above deals mostly with the global properties of the lexicon, and his computer simulations of a lexicon are hugely simplified in order to find out the basic features a lexicon should have. Elman (2004, 2009, 2011) has taken a closer look at what specific information should actually reside in the lexicon and how a lexical representation should be defined. Traditionally, the mental lexicon is thought to be a passive data structure that resides in long term memory, in which at the very least the meaning of a word is stored. The lexicon is often assumed to contain ‘types’, abstract representations of what is known about a certain word. All instances of the same word are then ‘tokens’ of this word. The meaning of each ‘token’ that we come across can be interpreted by way of this ‘type’ stored in our mental lexicon. Elman (2004) argues however that the meaning of a word is almost always context dependent, and therefore hardly ever means the exact same thing. Rather than having abstract representations of words stored in a mental lexicon, Elman proposes to think of words as having a similar effect as other kinds of sensory stimuli: as acting directly on mental states. Elman sums up this view by claiming that “words do not have meaning, they are cues to meaning” (2004, 306). In this view, there is no need for a lexicon in the traditional sense, and there are no ‘types’ stored in our brain, since we never encounter an actual ‘type’. Instead, as Elman puts it, “lexical knowledge is implicit in the effects that words have on internal states”. The effect the word will have on a mental state is always going to be a little bit different, since its effect is unavoidably influenced by the ever changing context. The mental state of a word

4 is for instance found to be slightly different when occurring with different agents. This poses a problem for the mental lexicon in the way it is traditionally understood; obviously, a word will not have a separate entry for its use with all separate agents. Nor is it feasible to have different entries for the different tenses of a verb, the use of different patients with the verb, location, the filler of the instrument role, or the information given in the broader discourse context, which have all been found to contribute to expectations regarding the arguments a verb will take (Elman, 2009). Even though there is no room for a stable, abstract representation in this view, words can still be recognised as different instances of the same meaning, by envisaging every lexical representation as inhabiting a bounded region of state space. Every occurrence of one word will then, even if it never produces the exact same state ever again, produce a state within this same bounded region, and will thus be correctly identified. Spivey & Dale (2004) also oppose the idea of a mental lexicon containing stable, discrete, symbolic representations. The continuous, temporal dynamics of cognitive processing in the brain would make it impossible to distinguish between discrete, mental representations. In fact, according to Spivey & Dale, these “graded mental states appear to be more than just temporary transitions between discrete mental representations, but instead may be the modus operandi of the mind” (2004, 91). They do not want to argue that pure mental states do not exist as attractor positions in state space, but rather that they hardly ever occur. In real time, all clues to categorise and thus recognise something are used, leading to a period of competition between different possibilities, up until the point where the clues only leave one option; that is, when the activation reaches a ‘basin of attraction’ that surrounds the attractor point that is the ‘pure’ mental state. In conclusion, Spivey & Dale find that in language processing, the time spent in unstable regions of state space, travelling toward an attractor basin, is usually much greater than the time spent in a relatively stable region of state-space.

Analysing Noise This temporal continuity of language processing found by Elman and Spivey & Dale shows the importance of including time when looking at linguistic behaviour. In complex dynamic systems, each step in time and each step in a process will influence the next. One by-product of the continuous processing on multiple time-scales can be found in the variation or noise signal that every experiment produces. Noise is usually discarded on the assumption of it being random. However, noise is more and more the subject of investigation and speculation, since in processes that rely on the interaction of many components it is often found to not be random at all. In self-organising systems, noise is often found to show a pattern and to correlate over shorter and longer timescales. Both short range correlations - indicating that the response on one item influences the response on the following item - and long range correlations are found in a wide variety of cognitive tasks (Van Orden, Holden &

5

Turvey, 2003; Holden, 2005; Thornton & Gilden, 2005). Correlations that are both short and long range indicate components within the system to interact on multiple time scales. This is a sign of self-organisation taking place on different levels within the system. Correlated noise can be observed when visually inspecting a trial series, since it will show “a progression of nested, similarly shaped arcs or patterns of fluctuation” (Holden 2005, 289). A time series that is thus comprised of smaller copies of itself possesses a so-called fractal structure. In response time experiments, data with a fractal structure is said to contain pink noise, and refers to measures that are statistically dependent upon one another; long term fluctuations nest within themselves increasingly smaller, proportionately scaled fluctuations, which in turn nest within themselves even smaller patterns of fluctuation, etc. Noise that shows a nested, fractal structure is referred to as pink noise. Pink noise can be said to reside in between two other types of noise that can be found in response time data (Kloos and Van Orden, 2010). For one, there is the noise that really is random; this type of noise is often referred to as white noise. A spectral plot can be used to decompose a trial series into sine waves of different amplitudes. These amplitudes relate to the sizes of the changes in a data series. A spectral plot made of random or white noise will show that changes of every size occur with equal frequency, and a line fitted to a spectral plot showing white noise will have a slope of approximately α ≈ 0; indicating that there is no correlation between changes of a certain size and the number of times a change of this size occurs in the trial series. A spectral plot made of pink noise will show that changes of different sizes do not occur with the same frequency. Figure 1 below shows the typical scaling relation of pink noise.

6

Figure 1. Figure 1 from Kloos and Van Orden (2010). A Spectral Plot that shows the typical scaling of pink noise. Upper right: reaction times of one subject. Lower right: spectral plot of reaction times with an average slope of -0.94 and four marked points referring to sinusoidal components displayed on the left.

The upper right plot displays the reaction times of one subject. The lower right shows a spectral plot constructed of this data; relative amplitude, or size of change, can be found on the vertical axis. The horizontal axis shows the frequency of change. The regression slope shows the scaling relation between the two, in this case -0.94, consistent with the scaling exponent of pink noise, which is α ≈ 1. The plots on the left show the spectral plot decomposed into sine waves of different amplitudes. The sine wave on the bottom left shows relatively small changes in the data that occur very frequently; these can be found in the many clustered dots in the bottom right of the spectral plot. On the top left the sine wave shows the big changes; these do not occur very often. This relates to the top left of the spectral plot; only a few dots show the few changes that occur of this size (Kloos and Van Orden, 2010). The third type of noise is referred to as brown noise. Brown noise displayed in a spectral plot will show a steeper slope than pink noise, of α ≈ 2. Where white noise is associated with random behaviour and does not show correlation between measurements, brown noise is associated with over-regular behaviour, and shows very strong dependence between measurements. Pink noise can be found in between white noise and brown noise (Kloos and Van Orden, 2010). Pink noise then, is found between over-random behaviour (white noise) and over- regular behaviour (brown noise), and can be observed when there is a balance between the two. It thus allows for both regular and random behaviour. Rigid, over-regular control only works in a very predictable environment, but fails when the environment becomes less

7 predictable; over-random performance allows for flexible behaviour, but cannot take advantage of the predictable features of the environment. Behaviour that allows for both options thus offers an “optimal combination between stability and flexibility in control” (Kloos and Van Orden 2010, 30) Behaviour that allows two opposing options constitutes an example of a critical state. In order for this balance to be maintained, critical states function as an attractor state. There is evidence that pink noise is a by-product of attractor states, since development and training have been found to change behaviour that elicits random white noise to more pink noise. This was found in a precision aiming task where participants had to draw lines as fast as possible between two dots that were 24 cm. apart using their non-dominant hand (Wijnants et al., 2009). The idea was that forcing participants to use their non-dominant hand would induce relatively unstable and uncoordinated behaviour that would leave plenty of room for improvement. Participants would complete five blocks of 1100 trials with 3 minute breaks in between the blocks. The time it took to trace from one dot to another was measured. Trace times on the early blocks were found to be quite random and show a scaling exponent around 0, consistent with white noise and thus irregular and uncoordinated behaviour. With practice, the trace times of the later blocks show scaling exponents that approach -1, the scaling exponent of pink noise. This trend toward pink noise can be seen as attraction toward pink noise (Wijnants et al., 2009; Kloos and Van Orden, 2010). Dynamics dependent on large numbers of interacting components are expected to generate pink noise. It is the interdependence among the many interacting components that explains the noise being correlated. As explained above, linguistic behaviour is the result of many interacting components; from linguistic capacity inherent to the speaker to contextual factors and experiences. Pink noise has indeed been observed in language tasks. Van Orden et al. 2003 conducted a simple reaction time experiment and a word naming experiment. For the simple reaction time experiment, subjects were required to say /ta/ quickly in a microphone when a signal to respond (#######) appeared on the screen. This simple reaction time experiment consisting of 1,100 trials yielded spectral slopes ranging from - 1.00 to -0.30, with a mean of -0.66. This is consistent with the scaling exponent of pink noise. The word naming task also consisted of 1,100 trials, and yielded spectral slopes that ranged from -0.49 to -0.14, with a mean of -0.29. This was also found to be consistent with pink noise, albeit partly decorrelated pink noise. After all, the simple reaction time experiment yielded steeper spectral slopes. Random word properties undoubtedly solicited some effect that could not be controlled for, even though the experiment lacked any overt conditions. Even though these word properties decorrelated the noise signal slightly, the spectral slopes were still found to be consistent with pink noise, and were found to reliably differ from randomised data (and thus white noise).

8

The experiment

Pilot The goal of the experiment was to look at variability inherent to language processing. Therefore, the variability caused by different word properties had to be reduced to a minimum. For this purpose orthographically similar, four and five letter, one-syllable words were selected. Two versions of the experiment were made; a Dutch and an English version. Half of the subjects started with the Dutch version, the other half with the English version. All words were frequent words in the target language, and easy to pronounce. For both languages, 305 words were selected. 5 of these words were meant to let participants become acquainted with the procedure and get used to the microphone. The other 300 words were presented to them in three separate random blocks, with a minute break in between. A pilot was conducted in order to select 200 words that showed least variation between individuals. 6 participants took part in the experiment. All were native speakers of Dutch and advanced learners of English as a second language; all were students of English at the . The results of the pilot are shown in Table 1 below:

Table 1. Mean RTs on both language versions of the pilot for 6 subjects

Mean Mean RTs RTs Dutch English P. 1 458 493 P. 2 462 482 P. 3 450 487 P. 4 442 495 P. 5 453 442 P. 6 467 461

Based on this pilot, for all of the items used in the pre-test, the z-scores of the reaction times were calculated. The 100 items with the highest z-scores, thus showing the highest variation, were removed from both the English and the Dutch language lists. This left 205 items per language that showed minimal variation between items. the 200 Dutch and English words with the lowest SDs between individuals were selected for the experiment.

Single subject study The goal of the experiment was to look at the performance of one subject over time. One participant took part in the experiment repeatedly over a period of 2 years. The subject always started with the Dutch language version. The experiment consisted of the 200 four and five letter words selected from the pilot. The words would be presented to the subject in 9 a fixed order. The subject was instructed to read these words aloud into a microphone. The reaction times and responses were recorded.

Participant The participant was (at the onset of testing) a 57 year old male professor of Linguistics at the University of Groningen. The participant was native Dutch, and an advanced learner of English.

Materials For both the English and the Dutch version of the experiment, the words used were selected from the CELEX/Cobuild lexical database. The goal was to reduce variation based on the choice of items to a minimum. Therefore, all words had a CV onset, consisted of one syllable, and were easy to pronounce. Both the pilot and the actual experiment were run on the computer programme E-prime 1.2 (Psychology Software Tools, 2001). The words that appeared on the screen had to be pronounced into a microphone attached to a PST Serial Response Box. The microphone was tested and its sensitivity optimised. The experiment was conducted in the same way as the pre-test, with the difference that there were 2 lists of 100 words which were presented in a fixed order.

Procedure The participant was tested individually in a soundproof room. The English and the Dutch version of the experiment were constructed in the exact same manner. The participant would sit facing a computer screen, with a microphone placed on the table in front of him. The test items would appear in the centre of the screen in black, lowercase Courier (18 points) on a white background. Each target was preceded by a fixation point in the middle of the screen for 1000 ms. The experiment was constructed so that as soon as the participant produced any noise, the word would disappear from the screen. The screen would be blank for 3000 ms., after which the next word would appear on the screen. This procedure gave the participant enough time to finish pronouncing the previous word. The procedure of the experiment is visually represented in Figure 2.

10

1000 ms

+ sound

target 3000 ms

Figure 2. Procedure of the presentation of targets

The response times would be measured up to the point where the participant would start to pronounce the word on the screen, and were recorded using the computer programme E- Prime. The actual responses were recorded using a portable voice-recorder, to ensure wrong responses could be filtered out afterwards, and to be able to check later on that the response times would not be based on any sounds produced by the participant before pronunciation of the word, such as breathing or swallowing noises. Before the experiment started, the participants would be presented with an instruction slide informing them to pronounce the words appearing on the screen as quickly as possible. The participant was tested 14 times in 7 days over a period of two years. One session would take place early in the day, the other late in the afternoon. The participant was first tested April 17th 2007, and again the next day, to test variation between two consecutive days. The third session took place a month later to look at the variation across weeks. Another three months later, the participant took place for the fourth time, this time after a week during which he had only used his L2 English. Two weeks later the fifth session took place, this time after a week during which he had only used his L1 Dutch, and had refrained completely from using his L2 English. The sixth session took place almost a year after the fifth one, and the seventh another year after that, to look at the variation over years. Table 2 below shows a complete overview of dates and times of all 14 sessions.

11

Table 2. Complete overview of dates and times of all test sessions

Session Date and time of testing 1.1 17/4/2007, 1 p.m. 1.2 17/4/2007, 6 p.m. 2.1 18/4/2007, 1 p.m. 2.2 18/4/2007, 6 p.m. 3.1 23/5/2007, 9 a.m. 3.2 23/5/2007, 4 p.m. 4.1 9/8/2007, 10 a.m. 4.2 9/8/2007, 2 p.m. 5.1 20/8/2007, 10 a.m. 5.2 20/8/2007, 1 p.m. 6.1 17/6/2008, 10 a.m. 6.2 17/6/2008, 1 p.m. 7.1 17/6/2009, 10 a.m. 7.2 17/6/2009, 5 p.m.

Throughout the rest of the article, the different test sessions will be referred to using the way they are numbered in Table 2 above. The sessions are numbered in chronological order of testing. The first number refers to the day of testing, the number after the dot refers to the time of testing; the ones numbered .1 being the morning sessions, the one numbered .2 being the afternoon sessions.

Analyses Before any of the analyses were conducted, extreme outliers were removed from the data, as there were caused by either missing or incorrect voice recordings. All data points with values that were more than 3 SDs from the mean were removed from the trial series. This affected less than 1% of the data. Spectral Analysis transforms the data from the time domain into a frequency domain by a Fast-Fourier transformation. The best-fitting sum of sine and cosine waves are established and conveyed to a spectral plot that relates frequency to amplitude on log-log scales. The slope of the fit line in this graph is the statistic of interest; a slope of ≈ 0 shows the structure of the signal to be random (white noise), while a steeper slope of ≈ -1 indicates a fractal structure that is associated with a balance between over-random and over-regular tendencies. Spectral analysis does not allow for missing values. The outliers that had been removed left some gaps in the data, and in order to leave the time series as intact as possible, these were not substituted by other values but were simply closed by moving the data following this gap up one position.

12

Results visual inspection and ANOVA Table 3. Mean RTs in ms. for English and Dutch sessions 1, 2, 3, 6 and 7

1.1 1.2 2.1 2.2 3.1 3.2 6.1 6.2 7.1 7.2 Dutch 454 471 465 470 464 481 459 472 480 483 English 529 522 514 479 516 489 510 520 509 498

540

520

500 Mean NL Mean EN 480 RTs in ms. Linear(Mean NL)

Linear(Mean EN) 460

440 1.1 1.2 2.1 2.2 3.1 3.2 6.1 6.2 7.1 7.2

Figure 3. Mean RTs in ms. with linear fit line for English and Dutch sessions 1, 2, 3, 6 and 7.

First of all, the mean RTs of the Dutch sessions of the experiment are consistently lower than the mean RTs of the English sessions. A repeated measures ANOVA analysis shows this to be significant, F(1,169) = 354, p<0.01. What is interesting to note about figure 3, is that the results on the Dutch sessions appear to be more stable and show less variability; in fact, the difference between the lowest mean RT (454, session 1.1) and the highest mean RT (483, session 7.2) is only 29 ms. The difference between the lowest mean RT (479, session 2.2) and the highest mean RT (529, session 1.1) on the English sessions is 50. One last point of interest that can be observed in figure 3 is that there seems to be a relation between the mean RTs on the Dutch and English sessions; when the RTs on the English sessions increase, the RTs on the Dutch sessions seem to decrease and vice versa. This also holds when looking at the trends overall; RTs on the Dutch sessions increase slightly over the period of testing, while the RTs on the English sessions decrease at about the same pace. A paired samples T-test showed the correlation of -.533 to not be significant, with p=0.11. One way of comparing the variation between the different test sessions and languages, is looking at the standard deviations (SDs) of the test sessions. Figure 4 below shows the Dutch mean RTs and the mean SDs of session 1,2, 3, 6 and 7. Figure 5 shows the mean RTs and corresponding SDs of the same English sessions. 13

60 530 50 510 40 RTs 490 30 SDs

RTs SDs 20 470 10

450 0 1.1 1.2 2.1 2.2 3.1 3.2 6.1 6.2 7.1 7.2 Dutch Sessions

Figure 4. Mean RTs and SDs for Dutch sessions 1,2,3,6 and 7.

530 60

50 510 40 RTs 490 30 SDs

RTs SDs 20 470 10 450 0 1.1 1.2 2.1 2.2 3.1 3.2 6.1 6.2 7.1 7.2 English Sessions

Figure 5. Mean RTs and SDs for English sessions 1,2,3,6 and 7.

One difference in comparing figures 4 and 5, is that the Dutch RTs are lower than the English RTs, as we have seen earlier. More interesting to note when looking at figures 4 and 5, is the relation between the SDs and the RTs. Figure 4 shows the mean RTs and SDs for the Dutch sessions, and visual inspection of figure 4 shows that the mean SDs of the Dutch sessions seem to follow the same pattern as the mean RTs; higher RTs seem to correspond consistently to higher SDs, and lower RTs to lower SDs. Figure 5 shows the mean RTs and SDs for the English sessions. The relationship between the mean RTs and mean SDs for the English sessions seems to be a lot more diffuse and a lot more chaotic. Sessions 4 and 5 are treated separately since these were controlled for context. Session 4 was conducted after 7 days of only using English, and session 5 after 7 days of only using Dutch. Comparing the RTs for both languages shows that the mean RTs on the English sessions are all higher than the mean RTs on the Dutch sessions. In this respect then, sessions 4 and 5 were not different from the other sessions. The RTs are longer on session 4, so after 7 days of only using English, for both languages. However, when looking at the absolute scores, it is hard to tell whether language context has affected performance. A repeated measures ANOVA with Language, Context and Time as variables was conducted to further investigate a context effect. The analysis confirmed the performance on

14 the Dutch sessions to be faster than on the English sessions, F(1,174) = 333.8, p<0.01. The main effect of context was also found to be significant, F(1,174) = 334, p<0.01, with slower response times on both language versions after the ‘All English’ period. The main effect of Time was not significant, which means there was no significant effect found for testing in the morning or the afternoon. The interaction between Context and Language was found to be significant, F(1,174) = 18,6, p<0.01. Figure 6 shows the interaction to be strongest in the ‘All English’ context. The interaction between Context and Time was also significant, F(1,174) = 29.1, p<0.01, the strongest context effect occurring in the morning sessions. This interaction is illustrated in Figure 7.

Figure 6. Mean RTs for the interaction between Context and Language.

15

Figure 7. Mean RTs for the interaction between Time and Context.

Figure 8. Mean RTs for the interaction between Language and Time of testing.

16

Since both languages were always tested in the mornings and the afternoons, a repeated measures ANOVA including all 14 sessions was conducted to test whether time of testing affected RTs. The main effect of time was again found not to be significant. The interaction between time of testing and language however was found to be significant, F(1,12) = 7.8, p=0.02. Figure 8 below shows the interaction; performance on the Dutch sessions is faster in the mornings and slower in the afternoons. The English sessions show the reverse effect; performance on these is slower in the mornings and faster in the afternoons.

Results spectral analysis Spectral analysis estimates the slope of the line that relates amplitude of change to frequency of change on log/log scales (see also p. 11 for an explanation of spectral analysis). Let us first take a look at the spectral analysis of one single trial series. Figure 9 A, B and C show the spectral decomposition of the intact trial series UK 1.1. Figure 9 D, E and F show the data of the same session, UK 1.1, after is has been randomised. Randomising the data entails that the order of the time series has been completely disrupted; performance has thus been separated from the time over which it occurs. Figures 9 A and D show the plain RTs after abnormalities have been filtered out (for details, see Method, p. 11). Figures B and E show an intermediate step in which the results have been plotted on linear axes. Frequency (x) is plotted against power (y) on linear scales; this means that the size of a certain change (power) has been plotted against the number of times changes of this size occur in this particular time series (frequency). Figures C and F display the spectral results after the frequency (x) and power (y) have been transformed to Base- 10 logarithmic scales.

17

Figure 9. Figures A-C show the spectral analysis of the intact trial series of UK 1.1. Figures D-F show the same analysis of the randomised data. A and D: Simple Reaction Time trial series; B and E: Simple Reaction Time power spectrum; C and F: Power Spectrum on Log- log scales.

18

Figures 9 A and D, which portray the RTs, already show an interesting difference. Figure A clearly has more of a pattern; on closer inspection, the data of the trial series contains a string of U-shaped or inverted U-shaped fluctuations. The trial series in A consists of a nested structure of similarly shaped fluctuations. The graph shows that data points are not dispersed equally around the mean; for instance, between the 50th and 100th item, there is a series of observations below the mean; while the last observations are mostly above the mean. The randomized results depicted in Figure D do not show a pattern of nested structure; all observations seem to be dispersed equally around the mean. There is no visible pattern in Figure D; the trial series is more jagged than the intact trial series depicted in Figure A. Figure B shows the results of the spectral decomposition. Frequency (x) is plotted against power, or amplitude squared, on (y). Low frequency in the plot is related with high amplitude and high frequency with low amplitude. The spectral plot of the random data (Figure E) is much more evenly distributed; occurrences of each amplitude occur with the same frequency. Figure C shows the same spectral results as in Figure B, but plotted on a logarithmic scale. Lower frequency again relates to higher amplitude in the form of a scaling relation between frequency and amplitude. The relation between frequency and amplitude is visualized in a fit line that has been applied to Figures C and F. The fit line in Figure C has a slope of -0.39, which is the scaling exponent of (partly decorrelated) pink noise. Figure F shows the spectral results of the random data plotted on a logarithmic scale. The fit line has a slope of ≈0, which means that amplitude of change and frequency of change are not correlated. Figure F therefore shows the power spectrum of white noise; different frequencies all have roughly equivalent power. The results of every session were separately subjected to a spectral analysis. The outcomes of the spectral analyses of each session are presented in Table 4 below.

19

Table 4. Spectral slope statistics for each Dutch and English session.

Slope of the power spectrum Session Dutch English 1.1 -0.22 -0.39 1.2 -0.22 -0.30 2.1 -0.44 -0.12 2.2 -0.02 -0.03 3.1 -0.03 -0.22 3.2 -0.16 -0.19 4.1 -0.17 -0.45 4.2 -0.04 -0.18 5.1 -0.38 -0.10 5.2 -0.19 -0.04 6.1 -0.21 -0.22 6.2 -0.23 -0.01 7.1 0.00 -0.31 7.2 -0.04 -0.16

The slopes on the Dutch sessions range from a minimum of -0.44 to a maximum of 0.00. The slopes on the English sessions range from a minimum of -0.45 to a maximum of -0.01. Both languages therefore show spectral slopes consistent with pink noise (slopes between 0 and -1). To test whether the results on the intact trial series do indeed consistently show a pink noise signal and differ reliably from white noise, each test session was also randomised, and the spectral analysis was repeated on the randomised data. The spectral slopes of the intact trial series shown in Table 4 were contrasted with the spectral slopes obtained from the randomised data. For the Dutch version of the experiment, all 14 intact trial series yielded steeper slopes than their randomised counterparts; a t-test confirmed this to be significant at p < 0.01 (intact trial series: M = -0.17, randomised trial series: M = 0.03). For the English version of the experiment, 13 of the 14 trial series yielded steeper slopes than their randomised counterparts; another t-test confirmed this to be significant at p < 0.01 (intact trial series: M = -0.19, randomised trial series: M = 0.00). The mean of the steepness of the spectral slopes only show a subtle difference between the Dutch and the English sessions; M = -0.17 for the Dutch sessions and M = -0.19 for the English sessions. Taking a closer look at Table 9 shows that the pink-noise signal seems to be more prominent in the morning sessions. Looking at the means of the morning sessions and the means of the afternoon sessions separately confirms this. Table 5 below shows the means for the morning and afternoon sessions per language:

20

Table 5. Spectral Slope means for the morning and afternoon Dutch and English sessions.

Dutch English Mean Morning -0,20 -0,26 Mean Afternoon -0,13 -0,13 Overall Mean -0,17 -0,19

The mean for the afternoon sessions is the same for the Dutch and the English sessions. The mean for the spectral slopes on the morning sessions is higher for both languages, and especially for the English sessions the difference between the morning and afternoon sessions is considerable. The difference between the morning and afternoon sessions in the steepness of the spectral slopes is significant for the English sessions only, p < 0,01. The context effect that was already found in comparing the RTs of sessions 4 and 5 above, is also apparent in the spectral slope statistics. The spectral slope statistics of sessions 4 and 5 have been repeated below in Table 6 for ease of reference. Note the interesting contrast between sessions 4 and 5. Table 6. Spectral Slope means for the morning and afternoon Dutch and English sessions.

Dutch English All 4.1 -0.17 -0.45 English 4.2 -0.04 -0.18 Context All 5.1 -0.38 -0.10 Dutch 5.2 -0.19 -0.04 Context

In the ‘All English’ context, the spectral slopes on the English language sessions are much steeper than on the Dutch language sessions. In the ‘All Dutch’ context, the effect is reversed; the spectral slopes on the Dutch language sessions are much steeper than on the English language sessions. This means that the language context does not only cause the speeding up or slowing down of the RTs in a word naming task; the language context also affects the distribution of changes in the RTs. More specifically, the noise in the word naming task seems to be more strongly correlated and show a more pronounced fractal structure after a period of using the language that is being tested. Table 6 shows one other noteworthy distinction; the context effect for both languages seems to be a lot stronger in the morning sessions than in the afternoon sessions.

21

Discussion and Conclusion The main aim of this experiment was to look at what we can learn from noise and variability in L1 and L2 language production, in an effort to gain insight into how lexical knowledge may be structured in the multilingual mind and to what extent there is a noticeable influence of language context. A first visual inspection seems to confirm multiple languages in the mind to interact closely; as performance on one language became faster, performance on the other language became slower and vice versa. A trend towards significance was found here with a correlation of -.533 with p=0.11; it would be interesting to test this with more frequent measurements. This pattern would be consistent with both languages being situated in interconnected networks, in which activation of one language leads to inhibition of the other language(s) in the system, as proposed by Meara (2006). A repeated measures ANOVA was used to look at a context effect and again showed that languages may be closely interconnected in the multilingual mind. The main effect of context was found to be significant, with performance on both languages being slower after the ‘All English’ period. This means it might not be realistic to look at performance on different languages in the multilingual brain separately. Even though the L1 is always thought to be more entrenched and more automatized than the L2, both languages appear equally sensitive for language use and language context. Spectral analyses showed trial series of both languages to exhibit a fractal structure, indicating that variability in language processing is heterogeneous rather than homogeneous. It also shows that measurements are interdependent and thus that language processing is continuous. As for differences between language processing in the L1 and L2, the means of the spectral slopes were very similar. The somewhat surprising conclusion is that processing the L1 and the L2, on the surface, seems to be equally variable or stable. One might expect the L2 would be more susceptible to changes due to language use than the more entrenched L1, but the data shows otherwise. Language processing very clearly is very sensitive to changes in language context and language use; however, the effect seems to be equally strong for the L1 than for the L2. This is a very interesting finding; after all, the linear analyses merely showed both the L1 and the L2 to speed up after a period of only using Dutch. The results obtained by the spectral analysis give some insight as to why that might be the case. As argued earlier, pink noise is found in between over-random and over- regular behaviour; between automaticity and control. The closer the noise signal approaches a scaling exponent of -1, the more controlled (and less random) the behaviour. However, looking very closely at the results in Table 14, this is not the only context effect. Not only does the language use of the particular language lead to more controlled performance on that language, but also the scaling exponents found for the language that is not being used are amongst the highest scaling exponents found. Therefore, performance on the language that is not being used shows a more white noise signal, associated with more random behaviour. This is in line with findings from attrition research, where language performance

22 after an extended period of non-use is found to be increasingly erratic and random; words that are not remembered one time are sometimes suddenly remembered the next. It seems plausible that constant language use makes it easier to exert control over language performance. Even though these are some very interesting results, this paper is no more than a first, tentative step at combining the standard linear analyses used in behavioural sciences with non-linear analyses that are more in line with continuity and dynamic self-organisation in language processing. Several annotations should be taken into account. First of all, it has been pointed out to us that the number of trials in this experiment is very low for measuring pink noise; usually, 512 trials are deemed necessary to get a reliable pink noise signal (personal communication, Van Orden, 6-9-2010). The fact that it is being found here is in part probably due to very careful item selection and a very advanced and motivated participant. Since there are 14 sessions per language, and the pink noise signal is found in most and differs significantly from randomised versions of the data, it is a reliable finding. Nonetheless, when testing participants unaccustomed with experimental procedures, it would be wise to test at least 512 items. Another important point concerns the observed context effect. Since language performance appears to be very sensitive to language use and language context, it should be very carefully controlled for in future experiments. A very interesting direction for future research would be to see how much language input is actually needed to ‘stir up’ the language system and influence the noise signal. Since the effect is so clearly noticeable in the performance on both languages after only a 7 day period, the differences found in the noise signal of the other 10 language sessions where context was not controlled for may actually be due to language use in the days or hours leading up to the test sessions. A follow-up experiment could control for language far more strictly than is done here, and give a much more exact estimate of how sensitive language performance is for language use: might a conversation, a phone call or recently sent email be enough to upset the balance between automatic and controlled behaviour? This paper is a modest first attempt at looking into the merits of focusing on the continuity and interconnectedness of language processing. This study has emphasized the importance of literally thinking out of the box and the self-imposed category boundaries.

23

References CELEX. (2001). The CELEX lexical database. Available http://celex.mpi.nl/ Elman, J.L. (2004). An alternative view of the mental lexicon. Trends in Cognitive Sciences, 8(7), 301-306. Elman, J.L. (2009). On the meaning of words and dinosaur bones: Lexical knowledge without a lexicon. Cognitive Science, 33, 547-582. Elman, J. L. (2011). Lexical knowledge without a lexicon? Mental Lexicon, 6(1), 1-33. E-Prime v. 1.2 (2006). [Computer Software]. Pittsburgh: Psychology Software Tools. Holden, J.G. (2005). Gauging the fractal dimension of response times from cognitive tasks. In M.A. Riley & G.C. Van Orden (eds.), Tutorials in contemporary nonlinear methods for the behavioral sciences (pp. 267-318). Retrieved September 22, 2010, from http://www.nsf.gov/sbe/bcs/pac/nmbs/nmbs.pdf Kloos, H., Van Orden, G.C. (2010). Voluntary behavior in cognitive motor tasks. Mind & Matter, 8(1), 19-43. Larsen-Freeman, D., Cameron, L. (2008). Research methodology on language development from a complex systems perspective. Modern Language Journal, 92(2), 200-213. Lowie, W., & Verspoor, M. (2011). The dynamics of : Levelt's speaking model revisited. In M. S. Schmid, & W. Lowie (Eds.), Modeling bilingualism: From structure to chaos (pp. 267-2088). Amsterdam, Philadelphia: Benjamins. Meara, P. (2002). The rediscovery of vocabulary. Second Language Research, 18, 407. Meara, P. (2004). Modelling Vocabulary Loss. Applied Linguistics, 25, 137-155. Meara, P. (2006). Emergent properties of multilingual lexicons. Applied Linguistics, 27(4), 620-644. Palier, C., Dehaene, S., Poline, J.B., LeBihan, D., Argenti, A.-M., Dupoux, E. et al. (2003). Brain imaging of language plasticity in adopted adults: Can a second language replace the first? Cerebral Cortex, 13, 155-161. Schmid, Monika S. (2010). Languages at play: The relevance of L1 attrition to the study of bilingualism. Bilingualism: Language and Cognition, 13, 1-7. Sparks, R., Ganschow, L. (1993). Searching for the cognitive locus of foreign language learning difficulties: linking first and second language learning. The Modern Language Journal 77(3), 289-302. Spivey, Michael J., Dale, Rick. (2004). On the continuity of mind: Toward a dynamical account of cognition. The Psychology of Learning and Motivation, 45, 87-142. Thornton, T.L., Gilden, D.L. (2005). Provenance of correlations in psychological data. Psychonomic Bulletin & Review, 12(3), 409-441.

24

Van Orden, G.C., Holden, J.G. & Turvey, M.T. (2003). Self-organization of cognitive performance. Journal of Experimental Psychology: General, 132(3), 331-350. Verspoor, Marjolijn, Wander Lowie & Marijn van Dijk (2008). Variability in L2 development from a dynamic systems perspective. The Modern Language Journal, 92(2), 214-231. Wijnants, M. L., Bosman, A. M. T., Hasselman, F., Cox, R. F. A., & Van Orden, G. C.(2009). 1/f scaling in movement time changes with practice in precision aiming. Nonlinear Dynamics, Psychology, and the Life Sciences, 13, 79-98.

25