The perception of lexical in Spanish

Joaquim Llisterri, María Machuca, Carme de la Mota, Montserrat Riera, Antonio Ríos Universitat Autònoma de Barcelona E-mail: {joaquim | maria | carme | montse | mestre}@liceu.uab.es

then been manipulated to obtain the test stimuli following ABSTRACT the procedure explained below.

As in other languages, stress in Spanish is signalled by The recorded corpus consisted of four meaningful three three simultaneous acoustic cues: fundamental frequency words with constant CV structure allowing the (F0), duration and intensity. In this experiment, the role of stress to be placed on the first (), second these parameters in the perception of lexical stress in () and final (oxytone) syllable –número, numero, isolated words has been studied using natural resynthesised numeró; límite, limite, limité; médico, medico, medicó; speech. Results show that the F0 contour alone is not válido, valido, validó– and four meaningless words in enough to allow the identification of the stressed syllable of which the position of the stress has also been varied a word. However, in combination with duration, intensity –*núlibo, *nulibo, *nulibó; *ládebo, *ladebo, *ladebó; or both duration and intensity, F0 is a relevant acoustic cue *máledo, *maledo, *maledó; *lúguido, *luguido, for the perception of lexical stress. On the other hand, *luguidó–. intensity and duration, either combined or in isolation, are not sufficient for the identification of the stressed syllable The 240 target words (10 repetitions of 8 words x 3 stress within a word. patterns) were analysed using the Praat software (©P. Boersma & D. Weenink, Institute of Phonetics, University of Amsterdam, http://www.praat.org). F0 was measured at 1. INTRODUCTION the beginning, centre and end of each of the three vowels in the word. Intensity values were obtained from five It is generally acknowledged that, as in other languages, equidistant points within each vowel. Vowel duration was stress in Spanish is signalled by three simultaneous acoustic also analysed. Mean values for the 10 repetitions of each cues: fundamental frequency (F0), duration and intensity. word were obtained, and they were used to create the set of An earlier perceptual study using synthetic stimuli [1] basic stimuli. concluded that, in Spanish, F0 is the only parameter systematically related to the identification of the stressed Several modifications were then performed to obtain the syllable of a word, while the role of duration depends on the stimuli used in the perceptual test. In words with lexical stress pattern. The experiments with natural resynthesised stress in the first syllable (as válido), mean F0, duration and speech reported in [2, 3] indicated that a replacement of the intensity values for each vowel were replaced by the mean F contour is not enough to induce the identification of the 0 F0, duration and intensity values found in the equivalent stressed syllable if the other two parameters are not word with stress on the second syllable (as valido). modified. Moreover, in words with lexical stress on the second syllable (as valido), the F0 duration and intensity values for Taking advantage of the fact that Spanish is a free accent each vowel were changed by the values found in the language –i.e. lexical stress can appear in any syllable of corresponding words with lexical stress on the final syllable the word– a perceptual experiment with natural (as validó). Oxytone words were not manipulated to avoid resynthesised speech has been designed to assess the role of shifting an F0 peak or duration and intensity values across a F0, duration and intensity in the identification of the word boundary. stressed syllable in isolated words, using lexical items with the same segmental content but with differences in stress Figures 1 and 2 show an example of the manipulation of a placement. To take into account the role of lexical single parameter: the F0 of the vowels in the words válido knowledge, phonologically acceptable but non-existent and valido is replaced by the F0 of the corresponding words have also been included in the test corpus. The vowels in valido and validó respectively, while the original contribution of each acoustic cue has been examined both duration and intensity are maintained. in isolation and in combination with other cues.

2. EXPERIMENTAL PROCEDURE

The primary data for the experiment has been extracted from the analysis of a corpus of isolated words read by a native speaker of Castilian Spanish. The recordings have their stress pattern. Within each test, stimuli were presented (a) Válido [balio] with (b) Válido [balio] with in random order. original F0 contour F0 contour extracted from valido [balio] The tests were administered through individual headphones at the Language Laboratory of the Department of French and Romance Philology at the Universitat Autònoma de Barcelona. Subjects were given written instructions on paper as well as an oral briefing; they were warned of the presence of existent and non-existent words on the test, and of the requirement that no blank replies were allowed. A set of five training stimuli was included at the beginning of b a l i δ o b a l i δ o each test, and questions on the procedure were taken after each training period. As there were more than 600 stimuli, the test was divided in two sessions: one in which stimuli with modifications in a single acoustic cue were presented and another one in which stimuli with cues in combination Figure 1: Waveform, F (black line) and intensity (grey line) 0 were used. In order to avoid listeners’ fatigue, breaks were for the word válido (a) and waveform, F and intensity after 0 introduced in each session, during which simple distracting superimposing the F contour of the word valido (b). 0 activities were carried out. Thirty speakers of Spanish, students at the Universitat Autònoma de Barcelona, with (a) Valido [balio] with (b) Valido [balio] with ages between 18 and 45 years old, responded to the test. A original F0 contour F0 contour extracted from total of 18480 replies were obtained. validó [balio] 3. RESULTS

Resynthesised natural items do not present special problems to the listeners when they are asked to identify the stress pattern. For stimuli without manipulation in the acoustic parameters – i.e. with the averaged values obtained b a l i δ o b a l i δ o from the reference speaker– correct identification of stress placement ranges from 92.97% to 100% in meaningful words and from 91.41% to 100% in meaningless words.

Results corresponding to the judgements about stimuli with Figure 2: Waveform, F0 (black line) and intensity (grey line) modified acoustic parameters are shown below. Results for the word valido (a) and waveform, F0 and intensity after from the first test, in which subjects were asked to identify superimposing the F0 contour of the word validó (b). the syllable bearing the stress, are presented in Table 1. Results obtained from the second test, in which subjects To create the test stimuli, each word was resynthesised with had to decide if a pair of words were coincident or not in the replaced values using PSOLA as implemented in Praat. their stress pattern are shown in Table 2. In both tables, results for the manipulation of each acoustic parameter in The values of each acoustic parameter (F0, duration and isolation are presented first (F0, D, I) , followed by those intensity) has been, in the first place, modified individually, obtained with the modification of paired acoustic cues (D+I, maintaining the original values of the other two parameters. F0+D, F0+I); finally, the results of the simultaneous Then, the values for two parameters have been manipulation of F0, duration and intensity are given superimposed together (F0 and duration, intensity and (F0+D+I). Besides, results for meaningful words are duration and F0 and intensity) maintaining the original presented in regular style and for meaningless words in values of the third. Finally, the values of the three italics. The second column indicates the modification that parameters have been simultaneously modified by has been performed in the stimuli: for example, “PP with P replacing all the original values. This strategy has allowed values” means an originally proparoxytone words in which the study of the perceptual effects of each acoustic cue both the values of the target acoustic parameters have been in isolation and in combination with others. replaced by those of a paroxytone word. In columns 3, 4 and 5 in table 1, the percentages of identification of the Two different kinds of tasks have been proposed to the stimuli as a proparoxytone (PP), a paroxytone (P) or an subjects who participated in the experiment. In the first one oxytone (O) word are presented. In table 2, “S” and “D” in (test 1), they were asked to identify the syllable bearing the the columns labelled “PP”, “P” and “O” correspond to the stress –the first, the second or the last– in a total of 336 percentage of identification as same or different of paired isolated words. In the second task (test 2), subjects were words. asked wether 280 pairs of words were equal or different in PP P O paroxytone words with values from oxytone words are perceived as oxytone in a maximum 1.67% of the cases. F0 PP with P values 61.67 38.33 0 Results corresponding to intensity values show a very 52.78 45 2.22 similar trend. P with O values 15 70.56 14.44 As for the cases in which two parameters have been 6.11 69.44 24.44 simultaneously modified, the results show that the effect of D PP with P values 99.44 0.56 0 the modification is not really perceived by the listeners 96.67 3.33 0 unless one of the superimposed parameters is the F0 contour. For instance, although meaningful proparoxytone words P with O values 2.22 96.11 1.67 with intensity and duration values from paroxytone words 13.33 85 1,67 are perceived as paroxytone only in 6.77% of the cases, they are clearly perceived as paroxytone (in 94.79% of the I PP with P values 98.33 1.11 0 cases) when F0 and duration are the superimposed 98.33 1.67 2.22 parameters. In a similar way, if the modified parameters are P with O values 0 97.78 2.22 F0 and intensity, meaningful proparoxytone words are perceived as being paroxytone in 79.17% of the cases. 1.33 97.33 1.33 Percentages increase when the modification affects F0, D+I PP with P values 93.23 6.77 0 intensity contours and duration simultaneously. In these 91.15 8.85 0 cases the rates can reach the 98.44%. Although percentages are always a bit lower, the same behaviour is observed in P with O values 5.21 80.73 14.06 meaningless words in all cases. 7.03 79.69 16.41 It is interesting to notice the simultaneous modification of F0+D PP with P values 4.17 94.79 1.04 F0 and other acoustic parameters triggers a high percentage 13.02 80.73 6.25 of responses showing that a change in the stress pattern has P with O values 5.73 16.67 77.60 been detected. This trend is also confirmed by the judgements obtained from the second test, as it is shown 16.15 16.15 67.70 below. F0+I PP with P values 20.31 79.17 0.56 22.92 70.31 6.77 In table 2, it can be observed that meaningful proparoxytone words with intensity values taken from P with O values 1.56 9.38 89.06 paroxytone words are not perceived as equal to 4.17 28.91 66.40 in 0.83% of the cases. When duration is manipulated the F +D+I PP with P values 1.04 98.44 0.52 same results are obtained (0.83%). If F0 contour is 0 manipulated but the original values of the other parameters, 4.17 91.14 4.69 are maintained, listeners the stimuli neither as equal to P with O values 0 1.56 98.44 paroxytones (32.5%) nor as (45.83). The same tendency is noted for meaningful paroxytone words: 9.38 3.12 87.50 they are not perceived equal to oxytones when duration Table 1: Results in % from test 1 for meaningful (regular) (0.83%) or intensity (0%) are individually manipulated. In and meaningless (italics) words. (F0 = fundamental the case of F0 manipulation, listeners consider they are not frequency; D = duration; I = intensity; PP = proparoxytone; oxytones (94.17%), but they have been identified as P = paroxytone; O = oxytone). paroxytones in a 59.17% of the replies. The same tendency is observed in meaningless words As for the identification test, it can be observed that meaningful proparoxytone words with a single modified When two acoustic cues are combined, results show a parameter are hardly perceived as paroxytone; in addition, similar tendency when duration and intensity appear under the same conditions, meaningful paroxytone words together: meaningful proparoxytone words with intensity are perceived as oxytone just in a very few cases. If the F0 contours and duration values from paroxytone words are contour is the superimposed parameter the results show the perceived as equal to paroxytone in 4.69% of the cases and same tendency but with higher scores, since meaningful as different from proparoxytones in 14.84% of the cases; proparoxytone words with a paroxytone F0 contour are paroxytone words with intensity contours and duration perceived as paroxytone in 38.33% of the cases, while values from oxytone words are perceived as equal to meaningful paroxytone words with an oxytone F0 contour oxytones in 14.84% and as different from paroxytones in are perceived as oxytones in 14.44% of the cases. On the 24.22% of the cases. On the contrary, proparoxytone words other hand, meaningless proparoxytone stimuli with are perceived as equal to paroxytone words in 98.44% duration values taken from paroxytone words are perceived when F0 is combined with duration, and in 77.34% when F0 as paroxytone in not more than 3.33% of the meaningless is combined with intensity. Paroxytone words are perceived words and in 0.56% of the meaningful ones, while as equal to oxytone words in 73.44% when F0 is combined with duration, and in 89.84% when F0 is combined with in this experiment offers a way of isolating the effect of intensity. Similar results are observed in meaningless words. each parameter while maintaining the naturalness of stimuli Finally, results obtained from the combination of the three by using resynthesised natural speech. acoustic parameters are similar to those obtained from the manipulation of F0 with the other two parameter –duration The results reveal that different conclusions can be or intensity–. obtained depending on the way the F0, duration and intensity are combined. The superposition of only one of PP P O the three parameters corresponding to another stress pattern S D S D S D is not sufficient to perceive a clear change of the stress pattern. Only the superposition of F0 in combination with F PP with P 45.83 54.17 32.5 67.5 0 one or more parameters triggers a high number of responses values 50 50 42.5 57.5 indicating a change in the stress location. P with O 59.17 40.83 5.83 94.17 The findings can be briefly summarized as follows. First, values 75.83 24.17 14.17 85.83 the position of the stress is correctly identified by subjects D PP with P 98.33 1.67 0.83 99.17 if F0 peak, duration or intensity values correspond to the values 100 0 2.5 97.50 lexically stressed syllable. Second, in those cases in which the F0 contour, the intensity contour or duration values P with O 95.83 4.17 0.83 99.17 have been replaced by the superimposed ones trying to values 84.17 15.83 0 100 displace the perception of the prominence to the right, the I PP with P 98.33 1.67 0.83 99.17 syllable originally bearing the lexical stress is identified by values listeners, in spite of the modification. This behaviour is 100 0 0.83 99.17 found for meaningful and for meaningless words. Third, P with O 98.33 1.67 0 100 when the values of the F0 contour are superimposed together with the values of the other two parameters values 96.67 3.33 2.22 97.78 –duration or intensity– listeners identify the syllable D PP with P 85.16 14.84 4.69 95.31 aligned with the F0 peak as stressed. On the contrary, if F0 + values 88.28 11.72 13.28 86.72 is not taken into account in the combination, the syllable with original lexical stress is the one perceived as stressed I P with O 75.78 24.22 14.84 85.16 by the listeners. Results for meaningless words show a values 72.92 27.08 5.21 94.79 similar tendency, but the percentages of identification are lower. F0 PP with P 0.78 99.22 98.44 1.56 values + 0.38 90.62 89.84 10.16 In can be concluded that, at least in isolated words, the F0 D P with O 14.06 85.94 73.44 26.56 contour alone is not sufficient to induce the identification of values 28.91 71.09 64.06 35.94 the syllable aligned with the F0 peak as stressed if the other two parameters are not modified, but it is an essential F0 PP with P 8.59 91.44 77.34 22.66 acoustic cue in combination with duration, intensity or both + values 19.53 80.47 66.41 33.59 duration and intensity. I P with O 12.50 87.5 89.84 10.16 values 43.75 56.25 50 50 REFERENCES F PP with P 14.84 85.16 71.09 28.91 0 [1] E. Enríquez, C. Casado and A. Santos, "La percepción values + 25.78 74.22 61.72 38.28 del acento en español", Lingüística Española Actual vol 11, D P with O 17.97 82.03 83.59 16.41 pp. 241-269, 1989. + values 47.92 52.08 44.79 55,21 [2] J. Llisterri, M. J. Machuca, C. de la Mota, M. Riera and I A. Ríos, "The role of F0 peaks in the identification of lexical stress in Spanish", in Phonetics and its Applications. Table 2: Results in % from test 2 for meaningful (regular) Festschrift for Jens-Peter Köster on the Occasion of his and meaningless (italics) words. (F0 = fundamental 60th Birthday, A. Braun and H.R. Masthoff, Eds., pp. frequency; D = duration; I = intensity; PP = proparoxytone; 350-361. Stuttgart: Franz Steiner Verlag, 2002. P = paroxytone; O = oxytone; S = same; D = different). [3] J. Llisterri, M. J. Machuca, C. de la Mota, M. Riera and A. Ríos, "Algunas cuestiones en torno al desplazamiento 4. CONCLUSIONS acentual en español", in La tonía: dimensiones fonéticas y fonológicas. México: El Colegio de México, 2002. The difficulty of studying the influence of acoustic http://liceu.uab.es/~joaquim/publicacions/Llisterri_et_al_2 parameters on stress perception depends on the fact that 002.pdf they act simultaneously in natural speech. The method used