Dept. for Speech, and Hearing Quarterly Progress and Status Report

An overview of a research project and preliminary results of two experiments on perception and production of foreign language sounds by musicians and non-musicians

Pastuszek-Lipinska, B.

journal: TMH-QPSR volume: 46 number: 1 year: 2004 pages: 061-074

http://www.speech.kth.se/qpsr

TMH-QPSR, KTH, Vol. 46, 2004

An overview of a research project and preliminary results of two experiments on perception and production of foreign language sounds by musicians and non-musicians

Barbara Pastuszek-Lipinska Department of Speech, Music and Hearing, KTH, Stockholm, Sweden, and School of English, Adam Mickiewicz University, Poznan, Poland e-mail address: [email protected]; [email protected]

The paper has been accepted for a workshop "Speech perception within or outside of phonology?" - part of the 27th annual DGfS conference and will be presented on February 2005 in Cologne (Germany).

Abstract This paper presents an overview of a new research project and outlines its design, goals and main aims, and the steps that have been already taken and will be undertaken within the project. Finally, preliminary results from two pilot experiments (pitch contour tracing and listening test) are presented. The project aims at investigating the influence of musical training and education on foreign language acquisition and specifically its influence on perception and production of foreign language sounds. This paper provides behavioural evidence on the issue. Studies and analysis of gathered data are currently underway and more detailed research results will be described in future publications.

1. Introduction training, for example, can influence the process of second language (L2) acquisition. In a series of earlier experiments, a number of researchers have investigated foreign language 1.1 What we know about language speech perception and production by normal development? healthy people (for review see Best, 1995; Bohn, 1995; Strange, 1995; Hume & Johnson, 2003; Many research studies provide evidence that at and works by Escudero, 2002). However, there birth, infants are able to distinguish all phonetic are still only a few investigations of the contrasts in the world’s languages (Polka, 1995). differences between people with normal hearing Several research study results have shown that and those with a professionally trained auditory young infants aged 6-8 months have excellent system – musicians. auditory capacities and can discriminate both Although music and language production native and non-native contrasts (Trehub, 1976). (speech) are in many respects different they also By the end of the first year of life, the ability have several similarities that may be inter- seems to disappear (Kuhl, 1995; Werker & Tees, related. They are two specifically human 1984), and gradually diminishes in the following phenomena. Both are conveyed by sounds and, years. in both, the human auditory system is involved. Researchers who have examined the evo- It is not surprising that such issues attract lution of language from the first months of a researchers from different domains to investi- human’s life have reported that the ability to gate the similarities and differences between discriminate can be connected with auditory music and language (for instance, music sensitivity (Kuhl, 2000) and that this capability processing and language processing, and music is changed by speech produced by adults who perception and speech perception). have contact with an infant. In this way, Our aim is to find out if one domain language experience and exposure has been influences another and to what extent musical recognized as the most important agent that

Speech, Music and Hearing, KTH, Stockholm, Sweden 61 TMH-QPSR, KTH, Vol. 46: 61-74, 2004 Pastuszek-Lipinska, B: Perception and production .... provokes changes necessary to discriminate new non-native contrasts and speech sounds still speech sounds. exists throughout the human life span but is Interestingly, the auditory acquisition and subject to “reorganization” (Diamond et al., thus the prosodic acquisition of a particular 1994; Polka, 1995; Werker & Pegg, 1992). So, it language seems to be completed before a baby seems that the human capacity to discriminate starts talking (first words by the end of the first foreign language sounds exists but is limited by year of its life) and all without any formal lack of experience and stimulation during many instructions. years of life. There is a consensus that language experi- In support of this evidence, several labo- ence plays an important role and has a crucial ratory studies have recently shown that impact on speech perception and production perception of non-native speech sounds can be (Gandour et al., 1998; Zhang et al., 2000). Many learned through special training (Dziubalska- approaches to L2 perception suggest that first Kolaczyk, 2003; Kjellin, 1999; Lively et al., language (L1) background plays an important 1994; Sari, 1996) and that the human auditory role also in L2 perception and has influence on cortex can be shaped by the manner in which the way in which non-native sounds are sounds are combined into words and sentences perceived (Ingram & Park, 1997). (phonological grammar)1 (Jacquemot et al., There is a consensus that children are able to 2003) that support neuropsychological evidence learn a particular language by only listening to on plasticity in the brain and human auditory its production, although during the process of functions. language acquisition they have to detect the Thus, adults have to work harder than relevant acoustic cues and then the relative children on acquiring a new language but, importance that the cues have in speech according to new research results, adults like production (Nittrouer, 2000). children can achieve native-like production of In speech perception people, i.e., native foreign language sounds. speakers usually are not conscious of the As scientists from other domains interested acoustic cues connected with their mother in cognitive science say, we can agree that tongue. “earlier means better” but we should also like to Studies in second language learning have say that to enable appropriate stimulation is to also found that listeners are more efficient at enable an environment for changes, possibilities perceiving sounds of their mother tongue than and new challenges. those of a new language acquired later in life Researchers also emphasize the important (e.g. Best et al. 1988; Best, 1994; Dupoux et al., role of prosody in second language speech 1997; Harnsberger, 2001; Hume & Johnson, perception and production (Chun, 1998; Kjellin, 2003; Polka & Werker, 1994; Strange, 1995). It 1999). They point out that when students start to seems, however, that first language has an learn a new language, some time is usually impact on second language and the process of a needed in order to learn to pronounce phonemes perceptual attunement to the native language that are not present in their native language. Yet contrasts decreases the ability of discrimination experience shows that a person with a good of some non-native contrasts. organization of a discourse, good grammar, and People learning new languages in adole- good knowledge of words and their meaning scence or adulthood often have difficulty in who lacks correct timing and pitch will be discrimination of non-native contrasts; the harder to understand. prosodic acquisition of a new language seems to So, it seems that there is a need to examine be difficult (Werker & Tees, 2002). which agents can aid the process. Adults who start learning a new language, It would be also interesting to find out under and already know in practice at least their native which conditions human ability to discriminate language, have to face many boundaries, different language sounds is maintained and because the process of maturation of their trained across the life span. auditory system is completed (Rauschecker, 2003). Yet, research studies show that adults compare poorly with children in phonetic units discrimination.

Studies of adult subjects have shown, 1 Phonological grammar (Kay, JD 1989, Phonology: however, that the auditory ability to discriminate a cognitive view: LEA: Hillsdale, NJ.)

62 TMH-QPSR, KTH, Vol. 46, 2004

1.2 Music in human life and cognition Researchers working in speech perception as stimuli in treatment and therapy and who are interested in speech development have found that infant-directed speech is in a Music plays an important role in all cultures number of aspects music-like (e.g. regular and, similar to language, is a means of commu- rhythms, slow tempo, pitch contours expanded nication between people. Music accompanies and repeated with altered lexical or segmental our everyday life and several research studies content and varying tempo, extended vowels (as have been so far devoted to the investigation of in song)) and finally musical qualities. Linguists its influence on several aspects of human life. also claim that infant-directed speech has a Music is already used in different kind of distinct suprasegmental structure (for review trainings that aim to improve health, e.g. after see: Trehub & Trainor, 1993). brain injuries, during mental and psychological Fernald (1992) even claimed that: “long trainings. before they understand language, babies find out As a good example, results of research what’s on their mothers’ minds through studies at the Institute for Music Research, emotional clues in the parent’s exaggerated, University of Texas and the Research Imaging musical speech”. So, according to this view, the Center, University of Texas Health Science musical components seem to be crucial in Center may be cited, where researchers language acquisition and development. examined the influence of melodic intonation Thus, on the basis of this consideration it is therapy on fluency of speech after brain injury valid to ask whether these musical features of and reported very promising results. infant-directed speech are crucial for language Researchers from the Ohio State University development, and whether musical expertise is a have newly examined how patients complete a means that helps in L2 sound perception and verbal fluency test while cardiac rehabilitation. production. In other words, to what extent is Music is used as stimuli and researchers have musical stimulation a factor that influences and reported significant improvement. Emery helps language acquisition? suggested, “listening to music may influence cognitive function through different pathways in 1.4 Musical education and training – the brain“ (Emery et al., 2003). Music is also used in foreign language class- overall characteristic rooms as a means to improve, for example, is a complex process that verbal memory and intonation (Chan et al., involves many different dimensions. It is based 1998). among other things on development of specific If short-time exposure (only listening auditory functions. Students of music are trained conditions) can provide such a positive effect it in pitch discrimination, rhythm and timing would not be perhaps useless to ask about a accuracy. The most important part of this kind long-time exposure (e.g. musical training and of education is to focus students’ attention on education). sounds. The human auditory system and So, in our study we are particularly interested memory are involved in this process. Emotions in healthy adults and their perception and that musicians would like to show, and/or production of sounds of a new language, and in generate in the audience, ensure the final shape the influence of long-term stimulation, such as of the music performance. musical training, on second language acquisition. 1.5 Does music make changes? - Neurological evidence 1.3 What we know about the music A number of researchers provided evidence on impact on the language development? the sensitive period both in music and in Can we find any impact on language of music at language. Researchers from the neurological the stage of the first language acquisition? Let domain have gained insight into the human us start from the beginning. As we have already brain, due to newly available technologies, and said children seem to be masters of sound have provided evidence that human brain is perception, but there is still the question of why plastic during all life-span (Syka, 2002). and how the process proceeds and which Works investigating similarities and differ- components are crucial in the process. ences between musicians and nonmusicians at the level of the brain are currently very

Speech, Music and Hearing, KTH, Stockholm, Sweden 63 TMH-QPSR, KTH, Vol. 46: 61-74, 2004 Pastuszek-Lipinska, B: Perception and production .... advanced and show important differences (both education and training and with and without qualitative and quantitative) between the two different languages competence were examined. groups of subjects (for review see Peretz, 2003; All subjects were recruited in Lodz and and Schlaug, 2003). The brains of musicians Kutno areas and participated in the study after have been recognized as possessing “exceptional stated consent and on a voluntary basis. neuroplasticity” (Munte et al., 2002). All subjects were native-speakers of Polish However, only a few studies have been aged from 15 to 69 years (mean 32). All subjects devoted to the investigation of differences and reported themselves to have normal hearing similarities between musicians and non- although some of them filled out in musicians at a behavioural level. questionnaires that had some hearing related If at the first stage of L1 acquisition, musical illnesses (e.g. otitis, other temporal impair- cues of speech that enable language acquisition ments) in the past, also subjects advanced in age play an important role. So perhaps musical could have age-related hearing changes. training that provokes qualitative changes in the At this stage of design of the project we human brain facilitates discrimination of foreign intended to have two groups: the first composed language. of non-musicians and the second composed of It seems to be interesting to examine also professional musicians (having at least reached behavioural changes that may follow the the level of secondary music school which is changes mentioned above (Peretz, 2003 etc). normal in Poland after 10-12 years of In order to investigate the question of education). behavioural changes, we developed a research After gathering background data we noticed project and carried out two experiments. that our first division (of musical competence) did not sufficiently describe the subjects. For 2. Research design and instance some professional musicians after 10- 12 years of education are currently not active in methodology music, and some subjects that claimed to be 2.1 Research question and hypothesis: non-musicians had some musical experience at childhood; there is also a small group of subjects Question: Do musicians meet fewer problems – non-musicians who even without any formal with discrimination of non-native speech training perform as unprofessional musicians- sounds? amateurs. Hypothesis: People with a musical background It seems that there are still some subjects will have fewer difficulties with perception and who cannot be classified according to our production of new speech sounds as a result of current division. constant auditory stimulation during musical Finally, however, our subjects were divided education and training. into 4 groups: 1/ musicians with at least 8 years of musical 2.2 Experimental design and education and still active in music; in this group procedure has been also classified professional organist who started his professional carrier after 6 years 2.2.1 The Corpus of formal training, 82 sentences in 6 languages (English: American, 2/ people active in music as amateurs – both British, Belgian Dutch, French, Italian, Spanish: with and without a musical background, European and from South America, and 3/ subjects with some musical experience in Japanese) have been synthesized for the corpus. childhood but not active in music, The ScanSoft® RealSpeak™ application was 4/ subjects without any musical background. used for the purpose. Among sentences were questions, statements and orders. All sentences 2) Procedure were recorded on CD (3 times repeated with Subjects were asked to repeat as accurately as pauses between them). they could some synthetic foreign-language sentences played by a CD player (Grundig) 2.2.2 Research data placed in a quiet area. No other information was 1) Participants given to the subjects. Conducted in Poland, a group of 106 subjects, Polish nationals, with and without musical

64 TMH-QPSR, KTH, Vol. 46, 2004

Participants

nonmusicians

people with musicial 39 background but not active in music 53 active amateurs

7 musicians 7

Figure 1. Subjects

All sentences produced by the subjects were harmonic perception - 4 examples: 1 sound, an recorded with Sharp MD-MT200 portable accord of three sounds with the middle sound to recorder and UNITRA-Tonsil Microphone repeat, 2 sounds with the low sound to repeat MCU-53 with a linear characteristic. All and finally an accord of three sounds with the subjects also filled out questionnaires with highest sound to repeat, comparison of two information on their own sex, age, education melodies with only small changes (both in (including start date of musical education and rhythm and in pitch), comparison of two times training and contact with particular languages), produced short melody in major key and then in music exposure, occupation, job, interests, as minor key, and 4 rhythms to reproduce (clap well as health (we asked subjects to give with hands). information on the eventual hearing problems The test of musical skills was based on and all illnesses which could produce negative standard entrance tests to music schools in effects on hearing). Poland, so we can assume that all musicians Subjects without musical background also were able to do this without any difficulties. The participated in a simple test of musical assumption is based on real-life cases, e.g. it is competence and abilities (Pastuszek-Lipinska, not possible to start and then continue musical 2003) prepared with the following tasks: 5 education without passing the described test. sounds to repeat, 4 words to sing, test of

MELODY RHYTHM HARMONY MEMORY 1 NONMUSICIANS 0,9 0,8 0,7 PEOPLE WITH M USICAL 0,6 BACKGROUND BUT NOT ACTIVE IN M USIC 0,5 ACTIVE AM ATEURS 0,4 0,3 0,2 PROFESSIONAL M USICIANS 0,1 PERCEPTUAL RATING RATING PERCEPTUAL 0

Figure 2. Results of the test of musical skills.

Speech, Music and Hearing, KTH, Stockholm, Sweden 65 TMH-QPSR, KTH, Vol. 46: 61-74, 2004 Pastuszek-Lipinska, B: Perception and production ....

3) Analysis of the data basis of the evidence we can claim that prosodic All the information gathered has been organized features (suprasegmental features) play crucial and memorized into a database. The next steps role in language acquisition and e.g. intonation – are analyses by means of statistical analysis and melody of a language express not only the results are plotted in a series of graphs and grammatical meaning (statement, question, tables with the support of Analyze-it +General order) but also emotions. 1.71 for Microsoft Excel program. Yet, in a study by Patel et al. (1998) authors suggested that the same neural resources may be shared in the processing of F0 contours in 3. Experiments speech and in melody contours in music and that 3.1 Experiment 1 - Pitch contour perhaps there may be also similarities in pitch tracing contours. When we come back to language develop- 3.1.1 Pitch contour – background ment and analyze the role of pitch contour information processing we should take into account that pitch contour processing may be an important The American National Standards Institute perceptual organizational device for infants, (1973) defines pitch as “…the attribute of which helps them in organization of complex auditory sensation in terms of witch sounds may auditory patterns both in musical and speech be ordered on a scale extending from high to sequences and particularly prosodic aspects of low”. speech (pitch, stress, and junction patterns) as Pitch is one of the structural patterns that are Trehub & Trainor (1993) suggested. present both in speech and music. So, on the basis of the presented data it In speech, a pitch contour is a continuous would be interesting to find out whether in melody-like pattern and is directly related to the foreign language people with long training and vibration rate of the vocal cords tones, and acts stimulation of auditory system through musical as localised semantic and grammatical infor- training and education perceive and produce mation. foreign language sounds more appropriately This melodic contour in speech is also the (meaning closer to native-like production or to a perceptual correlate of fundamental frequency given sample or at least more natural due to the (F0) over time, commonly called “intonation”. stimulation and sounds sensitivity). When we focus on speech production in the context of the current research project we cannot ignore the following findings and research 3.1.2 Experiment design and procedure results: 1) Subjects • First, on human beings ability to the Recordings with 106 speakers were analyzed. “voluntary control over the precise acoustic One person was excluded because he did not patterns of vocal expression”. As well as on repeat the second word of the sentence. Due to the fact that this ability plays a crucial role in the significant different sizes of the groups only both language and music, and particularly in two, the biggest and simultaneously the most speech production (Juslin & Laukka, 2003). important for our investigation, have been • Second, on the effect of superior auditory chosen for current results presentation: pro- skills (such as in musicians) on vocal fessional musicians (53) and nonmusicians (38). accuracy which reported that musicians even 2) Material without vocal training showed greater vocal The research data was the question “May I help production accuracy (Amir et al., 2003). you?” (US English) produced by each speaker. • And third, that shown “the auditory-vocal system controlling voice F0 seems to operate 3) Procedure mainly at the supra-segmental level, not at In order to investigate similarities and differ- the syllabic level” (Natke et al., 2003). ences between musicians and nonmusicians in pitch contour accuracy fundamental frequency Prosodic (suprasegmental) features of a (F0) of each word from the sentences produced given language have been very often said as by speakers have been extracted and analyzed. being crucial for L2 acquisition. Thus, on the

66 TMH-QPSR, KTH, Vol. 46, 2004

Soundswell 4.0 program has been used in b. Discussion analysis of F0 of the question “May I help you?” The results achieved cannot be accepted as final produced by each speaker. because of some additional aspects. The first 4) Data analysis and very important aspect seems to be the described above role of experience in language a. Results acquisition. English is a very popular language, which Several analyses were performed: could be heard accidentally by all subjects so it 1. Mean and max fundamental frequency (F0) seems that analysis of other languages or data of each word in the researched sentence. with F0 manipulated could provide other results. 2. Calculations of changes in mean F0 between The second aspect relates to results of similar syllables with semitones (in order to research project newly published by Schön et al. eliminate differences in frequencies between (2004). The authors examined whether extensive female and male voices), standard deviation musical training facilitates pitch contour from the sample recording and correlation processing not only in music but also in with the sample. language. Their results provide evidence that 3. And then data have been analyzed with extensive musical training influences the statistical functions (SPSS and Analyze- perception of pitch contour in spoken language. it+General 1.71 for Microsoft Excel programs). c. Conclusions At this stage no significant differences in pitch No significant differences between musicians contour accuracy between groups, and and non-musicians’ production at the level of especially between two of the biggest groups pitch contour were found. (musicians and non-musicians), have been Figure 3 shows results (mean, standard found. Both musicians and non-musicians did deviation and standard error-tables) within two not have difficulties with pitch accuracy and of the biggest groups of subjects: professional intonation in the described sentence. Current musicians and non-musicians. The graphs show research project will be repeated with: smaller standard deviations of the subjects’ production group, natural stimuli and with languages (mean F0 between syllables with semitones) unknown for all subjects. from the original sample.

n Mean SD SE 95% CI of Mean NONMUSICIANS 38 2.520 1.6777 0.2722 1.968 to 3.071 PROFESSIONAL MUSICIANS 53 2.644 1.4069 0.1932 2.256 to 3.032

9 8

7

6

5

4

3

2

1

0

-1 NONMUSICIA N PROFESSIONA L MUSICIA N PERCEPTUAL RATING Figure 3. Graphs of t-test analysis of the relation between the groups and the dependent variable (correlation with the sample recording).

Speech, Music and Hearing, KTH, Stockholm, Sweden 67 TMH-QPSR, KTH, Vol. 46: 61-74, 2004 Pastuszek-Lipinska, B: Perception and production ....

3.2 Experiment 2 - Listening test test was prepared and a computerized listening test (program Judge by Svante Granqvist2) was 3.2.1 Experimental design and procedure run with a panel of listeners with the aim to 1. Subjects investigate how native speakers of English Group of subjects consisted of 7 native speakers perceive musicians’ and non-musicians’ produc- of English, namely 4 colleagues from KTH, 1 tion. The subjects’ tasks were to rate, for each from Indiana University (US) and 2 from York stimulus, on a Visual Analogue Scale whether University (England). 4 females and 3 males, they felt that they heard bad or good English and aged from 31 to 62 (mean 49). All subjects put results on the scale with a cursor. The participated as listeners in the current test on a subjects were presented with a visual analogue voluntary basis. rating scale on the computer display, the scale ranges from 0 marked ‘bad’ to 1000 marked 2. Material ‘good’. Each subject was presented with a Research material was the sentence “May I help differently randomized list of stimuli. The you?” repeated by 106 speakers – native program recorded all response settings. The test speakers of Polish. was preceded by a short introduction explaining that recorded subjects were native speakers of 3. Procedure Polish who heard synthetic stimuli and after In order to relate acoustical cues to human three repetitions reproduced heard sentences perception, listening tests are undertaken and with taking into account both segmental (vowels different rating methods are used. Taking into and consonants) and suprasegmental (intonation, account a large number of speakers in the rhythm, stress, and rate) features. During the current study, the Visual Analogue Scale (VAS) test, subjects could listen as many times as they has been chosen as a verified means to measure needed to a given stimulus before giving a score. a variety of subjective phenomena. A listening The task lasted around 20 minutes.

n Mean SD SE 95% CI of Mean NONMUSICIANS 39 341.370 163.3693 26.1600 288.412 to 394.328 PROFESSIONAL MUSICIANS 53 488.720 139.3499 19.1412 450.310 to 527.129

Median IQR 95% CI of Median 307.714 220.143 262.857 to 384.714 508.571 180.714 463.857 to 541.857

800

700

600

500

400

300

200

100

0 NONMUSICIA N PROFESSIONA L PERCEPTUAL RATING MUSICIA N

Figure 4. Graphs with t-test analysis for dependent samples illustrating the relation between grouping and the dependent sample (mean perceptual2 The rating computer scores program given by ”Judge” listeners). is an implementation of the Visual Analogue Scale on computer. Author: Svante Grangvist, PhD.

68 TMH-QPSR, KTH, Vol. 46, 2004

4. Data analysis 2) We have also found significant correlation In the current paper, however, all data have been between musical abilities and average scores involved due to the different sizes of the groups. given by listeners. When differences between the groups are presented, only two the biggest groups are Results of 2-tailed Pearson correlation test involved (39 non-musicians and 53 professional show that p-value (statistical significance) is musicians). 0.0016 and is highly significant. This may Data have been analyzed with SPSS and suggest that musical skills also play a role in the Analyze-it + General 1.71 for Microsoft Excel processes of speech perception and production. programs. Test – Pearson correlation a. Results 1) We have found significant differences in n 106 musicians’ and non-musicians’ production r statistic 0.30 of the researched question, “May I help 95% CI 0.12 to 0.47 you?” Listeners have perceived musicians as more fluent than non-musicians. 2-tailed p 0.0016 (t approx.)

Figure 4 presents results achieved for these two of the biggest groups. Tables with Pearson correlation was performed in order descriptive comparative analyses and graphs to measure relationship between mean rating by between the groups are presented above. listeners (average), and musical skills (taking According to listeners – native speakers of into account perceptual rating scale (range from English’ scores the differences between the two 0 to 1)). groups are very significant.

800

700

600

500

400 AVERAGE 300

200

100

0 0,1 0,3 0,5 0,7 0,9 1,1 MUSICAL SKILLS

Figure 5. Pearson correlation results between mean production according to listeners’ scores and musical skills.

Speech, Music and Hearing, KTH, Stockholm, Sweden 69 TMH-QPSR, KTH, Vol. 46: 61-74, 2004 Pastuszek-Lipinska, B: Perception and production ....

3) We have also found correlation between subjects’ production and language experience.

Results of 2-tailed Pearson correlation test Test - Pearson correlation presented in Figure 6 that measured correlation between mean rating by listeners (average) and n 106 language experience also show significant p- r statistic 0.36 value that is 0.0001. 95% CI 0.18 to 0.52

2-tailed p 0.0001 (t approx.)

800

700

600

500

400 AVERAGE 300

200

100

0 -55 15253545 ENGL ISH

Figure 6. Pearson correlation - results showing correlation between English experience and mean rating.

1200

1000

800

600

400

200

0 PERCEPTUAL RATING -200 A B C DE FG

LISTENERS

Figure 7. Graphs with comparative analysis of listeners’ mean rating scores including all subjects (speakers).

70 TMH-QPSR, KTH, Vol. 46, 2004

Figure 7 presents scores given by all 4.2 Synthetic stimuli versus natural participants of listening test. The scores differ stimuli between subjects and it seems that their personal In the current study, synthesized sentences level of acceptability plays an important role in generated by a text-to-speech system were used the evaluation process. as stimuli3. Although that kind of stimuli is now In order to evaluate reliability of obtained very popular, and a number of researchers have data we used Cronbach’s Alpha (a tool that is already utilized computer-synthesized stimuli as used for assessing the reliability of scales, in research material, there are, however, still some other words for assessing the consistency of doubts about the influence of unnatural speech received data). sounds on the results. Also in question is the When data have multidimensional structure influence of that kind of stimuli intelligibility. this parameter is usually low. In our study Researchers have conducted several research however Cronbach’s Alpha on standardized projects in order to analyze perception of items is very high α synthetic versus natural stimuli and, on the basis =0,839 of the data; it is known that there are several advantages (e.g. the possibility to analyze On the basis of this parameter we can see separately a given parameter, to manipulate F0) that the results are very consistent and all but also several disadvantages (e.g. the above- listeners’ scores have unidimensional structure. mentioned artificiality of the sounds). Bailly (2003) carried out several experiments 4. General discussion – on that had as a main goal to compare natural experimental data and versus synthetic speech in shadowing speech (meaning imitated just after heard stimuli). procedure He observed that an average delay (time In recent years, several tests have been carried needed to start imitate speech) is different for out in different conditions: e.g. only hearing, natural stimuli (less than 50 ms) than for visual and audiovisual conditions. synthetic speech (exceeds 100 ms); he claimed Experiments have also been carried out that that this could be mainly due to prosody involved a speaker’s face and accompanying differences and that actual text-to-speech gestures as well as speech sounds (Traunmuller, systems still generate “inappropriate and 1998; Massaro, 2001). impoverished prosody”. The current study used only the auditory Before the samples were generated for the signal as a perceptual cue. The aim of the study purpose of the current research project a number is to examine behavioural response of musicians of text-to-speech systems were examined and and non-musicians on foreign language sounds. the ScanSoft® RealSpeak™ application was chosen as the most intelligible (in our opinion) 4.1 Subjects and their attitude system, even though some sentences still sound artificial. Although all subjects gave their consent to the But yet, only a few of our subjects realized participation in the project their dispositions that they heard non-natural voices. were different. Several subjects were nervous of None of our subjects was familiar with this kind of experiment, some of them were speech synthesis. convinced that their performance would be So, it seems to be important to note that in negative, others were doubtful and almost the listening experiment reported below, our convinced that they cannot be useful as a subjects in fact repeated the imitations of subject. Some of our subjects claimed that the synthetic stimuli and not natural human voices. task was too difficult for them. This may have influenced our results. We think it is an important element that We can claim that subjects – both musicians could influence the results negatively because and non-musicians have the same starting point motivation plays an important role in but the question as to whether and how that performance. might influence results still exists. So, the

3 See Corpus

Speech, Music and Hearing, KTH, Stockholm, Sweden 71 TMH-QPSR, KTH, Vol. 46: 61-74, 2004 Pastuszek-Lipinska, B: Perception and production .... experiment will be repeated in the future also like production”, so that to help listeners in with natural stimuli. evaluation process. • Training of listeners before starting whole 4.3 Work with a microphone test – showing how big differences between Another problem is connected with set-up of a subjects are possible (6-10 the most varied microphone. Not all people are familiar with samples will be presented at the beginning). microphones and sometimes, even after clear instructions that the distance between the 5. Conclusions microphone and the mouth should remain Steps undertaken so far and described above unaltered throughout the experiment (in this could be affected by some weaknesses, so it case 20 centimetres), the subjects attempt to seems reasonable to repeat some of the move the microphone. As a result, we obtained experiments and to introduce some changes in some recordings that were either too loud or too the procedure. It would be also advantageous to soft. In the future, a fixed distance will be take into consideration all suggestions by ensured. colleagues who participated in the project and Such difficulties in recording are well known who gave their opinions about the proposed to researchers working in the domain of sound approach. analysis; we intend here, however, to show all Research results have shown some directions potential risks and weaknesses of our research on aspects that should be re-examined and results, as well as to highlight how we shall reanalyzed in order to provide data as reliable as introduce all necessary changes in the future possible. research studies. The preliminary results of the experiments show evidence that in some way confirms the 4.4 Using VAS and data reliability hypothesis on positive influence of musical The next doubt is connected with applied scale training and education on second language itself, which provoked difficulties and could be acquisition (SLA) and particularly on foreign a means of potential mistakes and incon- language sounds perception and production. sistencies. The number of samples is very big However, there is still a need to introduce some and it seems to be a difficult task to precisely changes and improvements to the methodology evaluate them. in order to receive more reliable research results Listeners claimed after the test that after they as well as to continue in depth analysis of all had evaluated a given subject as very bad (e.g. gathered data. Although the evidence is small it 0) then in some cases they found other subjects is, at this point, worth pursuing the question of even worst but they did not have a scale for whether there is an influence of musical training lower evaluation. One of the listeners (subject F) on second language acquisition. We will thus used only a half of the VAS scale. repeat experiments with other languages and In order to avoid in the future problems with develop some new experiments with natural using the VAS by listeners (see page 11), stimuli. We also intend to observe our two information on the whole scale use will be groups while spontaneous speech (when subjects added to the instructions before the next know a given language) and after some training listening tests. period, so that to achieve complex data on how In order to eliminate the problems described perception and production of foreign language above in the future the following suggestions sounds is influenced by musical education and and remarks of listeners will be introduced: training.

• List of instructions not only oral but also on 6. Acknowledgements paper so that to ensure they are the same for each listener. These investigations are supported by EU Marie • More detailed information about rating range Curie Fellowship contract No. HPMT-CT-2000- – it seems that scale from bad to good is not 00119. The author would like to thank all enough, so in the next listening test the two participants for their kind participation all definitions will be additionally described: remarks and patience. To Dr. Peta Sjölander, Dr. “barely understandable” and “almost native- Svante Granqvist, Prof. Johan Sundberg and Prof. Robert McAllister for support and valuable

72 TMH-QPSR, KTH, Vol. 46, 2004

discussions, and to Dr. Joakim Westerlund for Journal of Acute and Critical Care 32/6 help with the statistical analyses. November/December (abstract). Escudero P (2002). The perception of English vowel contrasts: acoustic cue reliance in the development 7. References of new contrasts. In: Leather J & James A (Eds). Amir O, Amir N & Kishon-Rabin L (2003). The effect Proceedings of the 4th Symposium on the of superior auditory skills on vocal accuracy. JASA Acquisition of Second Language Speech, New 113/2: 1102-8. Sounds 2000, University of Klagenfurt, 122-131. Bailly G (2003). Close shadowing natural versus Escudero P (2002). Learning the sounds of a new synthetic speech. Kluwer Academic Publishers. language: Adults can do it too! Lay language International Journal of Speech Technology 6: 11- article. In: World Wide Press Room of the 19. Acoustical Society of America, Cancun Edition. Best CT (1994). The emergence of native-language Escudero P & Polka L (2003). A Cross-language study phonological influences in infants: A perceptual of vowel categorization and vowel acoustics: assimilation model. In: HC Nusbaum & J Canadian English versus Canadian French. 15th Goodman (Eds). The development of speech ICPhS Barcelona: 861-864. perception: The transition from speech sounds to Fernald A (1992). Speech on the annual meeting of the spoken words. Cambridge, MA: MIT Press, 167- American Association for the Advancement of 224. Science (AAAS) in Chicago on Feb. 8. Best CT (1995). A direct realist view of cross- Gandour J, Wong D & Hutchins G (1998). Pitch language speech perception. In: Strange, W. (ed.). processing in the human brain is influenced by Speech perception and linguistic experience: language experience, Neuroreport 9/9: 2115-2119. Theoretical and methodological issues. Baltimore: Harnsberger J (2001). The perception of Malayalam York Press. 171-203. nasal consonants by Marathi, Punjabi, Tamil, Best C, McRoberts G & Sithole N (1988). Oriya, Bengali, and American English listeners: A Examination of perceptual reorganization for multidimensional scaling analysis. Journal of nonnative speech contrasts. Journal of Experi- Phonetics 29: 303-327. mental Psychology 14: 346-360. Hume E & Johnson K (2003). The impact of partial th Bohn, O.-S. 1995. ‘Cross language speech perception phonological contrast on speech perception. 15 in adults: first language transfer doesn’t tell it all’. ICPhS Barcelona: 2385-2388. In Strange, W (ed.). Speech perception and Ingram JCL & Park S-G (1997). Cross-language linguistic experience: Theoretical and vowel perception and production by Japanese and methodological issues. Baltimore: York Press. 279- Korean learners of English. Journal of Phonetics 304. 25: 437-470. Chan Agnes S, Yim-Chi Ho & Mei-Chun Cheung Jacquemot Ch, Pallier Ch, LeBihan D, Dehaene S & (1998). Department of Psychology, The Chinese Dupoux E (2003). Phonological grammar shapes University of Hong Kong, Nature, vol 396: 128. the auditory cortex: a functional Magnetic Reso- Chun D (1998). Signal Analysis Software for nance Imaging study. J Neurosci 23 (29): 9541-6. Teaching Discourse Intonation. Journal of Juslin PN & Laukka P (2003). Emotional expression Language Learning & Technology 8/1. Vol. 2, No. in speech and music: evidence of cross-modal 1: 61-77. similarities. Ann NY Acad Sci. 1000: 279-282. Diamond A, Werker JF & Lalonde C (1994). Toward Kjellin O (1999). Accent addition: Prosody and understanding commonalties in the development of perception facilitates second language learning. In: object search, detour navigation, categorization, Fujimura O, Joseph BD & Palek B (Eds). and speech perception. In: Dawson G & Fischer Proceedings of Lp'98 (Linguistics and Phonetics KW (Eds), Human behavior and the developing Conference) at Ohio State University, Columbus, brain. New York, NY: Guilford, 380-426. Ohio, Sept 1998. Prague: The Karolinum Press. 2: Dupoux E, Pallier C, Sebastian N & Mehler J (1997). 373-398. A destressing 'deafness' in French? Journal of Kuhl PK (1995). Mechanisms of developmental Memory and Language 36: 406-421. change in speech and language. Proceedings of Dziubalska-Kołaczyk K (2003). How learners “repair” ICPhS95, Stockholm, 2: 132-139. second language phonology and whether they may Kuhl PK (2000). A new view of language acquisition, become native speakers. In: Waniek-Klimczak E & PNAS, October 24, 97/22: 11850-11857. Włodzimierz S (Eds). Dydaktyka fonetyki języka Lively SE, Pisoni DB, Yamada RA, Tohkura YI & obcego. Neofilologia, tom V Płock: Zeszyty Yamada T (1994). Training Japanese listeners to Naukowe PWSZ w Płocku. identify English /r/ and /l/: III. Long-term retention Emery CF, Hsiao ET, Hill SM & Frid DJ (2003). of newphonetic categories. JASA 96/4: 2076-2087. Short-term effects of exercise and music on Massaro DW (2001). Speech perception. In: Elsevier cognitive performance among participants in a Science Ltd, International Encyclopedia of the cardiac rehabilitation program. Heart & Lung: the Social and Behavioral Sciences, 14870-14875.

Speech, Music and Hearing, KTH, Stockholm, Sweden 73 TMH-QPSR, KTH, Vol. 46: 61-74, 2004 Pastuszek-Lipinska, B: Perception and production ....

Munte TF, Altenmuller E & Jancke L (2002). The in both music and language, Psychophysiology, musician’s brain as a model of neuroplasticity, 41/3: 341 (abstract). Nature Neuroscience, June, 3: 473-478. Strange W (1995). Cross-language studies of speech Natke U, Donath T & Kalveram K (2003). Control of perception: A historical review. In: Strange W (Ed) voice fundamental frequency in speaking versus Speech Perception and Linguistic Experience. singing, JASA 113/3 :1587-1593. Baltimore: York Press, 3-45. Nittrouer S (2000). Learning to apprehend phonetic Strange, W (ed.) (1995). Speech perception and structure from the speech signal: the hows and linguistic experience: Theoretical and methodo- whys. Paper presented at the VIIIth Meeting of the logical issues. Baltimore: York Press. International Clinical Phonetics and Linguistics Syka J (2002). Plastic changes in the central auditory Association, Queen Margaret University College, system after hearing loss, restoration of function, Edinburgh. and during learning. Physiol Rev 82: 601–636. Pastuszek-Lipińska B (2003) Unpublished test of Traunmuller H (1998). Modulation and demodulation musical skills. in production, perception, and imitation of speech Patel A, Peretz I, Tramo M & Labreque R (1998). and bodily gestures. Proceedings of FONETIC 98, Processing prosodic and musical patterns: A neuro- Dept of Linguistics, Stockhlom University, 40 – psychological investigation, Brain and Language 43. 61: 123-144. Trehub SE (1976). The discrimination of foreign Peretz I (2003). Brain specialization for music: new speech contrasts by infants and adults, Child evidence from congenital . In: Peretz I & Development, 47: 466-472. Zatorre R (Eds). The cognitive neuroscience of Trehub S & Trainor L (1993). Listening strategies in music, Oxford University Press, 192-203. infancy: the roots of music and language Polka L (1995). Developmental patterns in infant development. In: McAdams S & Bigand E (Eds) speech perception. Proceedings of ICPhS95, (1993). Thinking in sound: the cognitive Stockholm 2: 148-155. psychology of human audition. Oxford University Polka L & Werker J (1994). Developmental changes Press, 278-327. in perception of non-native vowel contrasts. Werker JF & Pegg JE (1992). Infant speech perception Journal of Experimental Psychology: Human and phonological acquisition. In: Ferguson C, Perception and Performance, 20/2: 421-435. Menn L & Stoel-Gammon C (Eds). Phonological Rauschecker JP (2003). Functional organization and Development: Models, Research, and Implications. plasticity of auditory cortex. In: Peretz I & Zatorre Parkton Maryland: York. R (Eds). The cognitive , Werker J & Tees R (1984). Phonemic and phonetic Oxford University Press, 357-365. factors in adult cross-language speech perception, Report from a November 1998 meeting of the Society JASA 75/6: 1866-1878. for Neuroscience in Los Angeles: Research Shows Werker JF & Tees RC (2002). Cross-language speech Correlation Between Music and Language perception: Evidence for perceptual reorganization Mechanisms; http:// during the first year of life. Infant Behaviour and www.menc.org/information/advocate/brain.html Development, 25: 121-133. Sari R (1996). Unpublished MA Thesis: The Role of Zhang Y, Kuhl PK, Imada T, Iverson P, Pruitt J, Receptive Skills in Enhancing Second Language Kotani M & Stevens E (2000). Neural plasticity Acquisition, http://www.maxpages.com/thena/- revealed in perceptual training of a Japanese adult From_Reception_to_Production. listener to learn American /l-r/ contrast: a whole- Schlaug G (2003). The brain of musicians. In: The head magnetoencephalography study, Sixth cognitive neuroscience of music. In: Peretz I & International Conference on Spoken Language Zatorre R (Eds), Oxford University Press, 366-381. Processing (ICSLP 2000) Beijing, China, 16-20. Schön D, Magne C & Besson M (2004). The music of speech: Music training facilitates pitch processing

74