Inter-Dependent Categorization of Voices and Segments

ICPhS XVII Regular Session Hong Kong, 17-21 August 2011 INTER-DEPENDENT CATEGORIZATION OF VOICES AND SEGMENTS Anne Cutlera, Attila Andicsa & Zhou Fangb aMax Planck Institute for Psycholinguistics, Nijmegen, the Netherlands; bDonders Institute for Brain, Cognition and Behaviour, Radboud University Nijmegen, the Netherlands [email protected]/[email protected]; [email protected]; [email protected] ABSTRACT irrelevant dimensions tends to slow target dimension decisions. Listeners performed speeded two-alternative choice The paradigm’s attraction is that the decision is between two unfamiliar and relatively similar voices simple (two-alternative forced choice: 2AFC) and or between two phonetically close segments, in VC participants' responses soon stabilise at a fast and syllables. For each decision type (segment, voice), consistent level. If, in such a situation, participants the non-target dimension (voice, segment) either cannot selectively attend to one dimension and was constant, or varied across four alternatives. ignore variation in another, then processing of the Responses were always slower when a non-target two dimensions is likely to be inter-dependent in dimension varied than when it did not, but the effect general. In contrast, if selective attention to a of phonetic variation on voice identity decision was dimension is possible, i.e., decisions are unaffected stronger than that of voice variation on phonetic by variation in another dimension, then the two identity decision. Cues to voice and segment identity dimensions can clearly be processed independently. in speech are processed inter-dependently, but hard The task was developed for the study of visual categorization decisions about voices draw on, and perception (decisions about horizontal and vertical are hence sensitive to, segmental information. position of a dot proved dependent; decisions on Keywords: voice, vowels, consonants, Garner task size of a circle and angle of a line through it proved independent [8]). It was rapidly deployed in research 1. INTRODUCTION on speech perception [4, 12, 19, 20]. In these studies The processing of speech delivers information on the structure of the visual experiments (two levels multiple levels, and listeners appear capable of each of two dimensions) was applied to auditory using any and all levels of information from which categories. A central issue at the time was whether they can benefit. One of the types of information auditory processing needed to be completed before presented in speech signals is the talker's voice, phonetic processing could begin. If so, then auditory which allows listeners to recognize talker identity, categorization should be impervious to phonetic or derive relevant speaker-specific information even variation, but phonetic classification would always from the speech of unfamiliar talkers. be influenced by irrelevant auditory variation. Much recent evidence suggests that information In fact, most results showed mutual dependence: about voice is not discarded by normalization whatever the dimensions, irrelevant variation processes, but is processed together with phonetic slows RTs. Consonant choices in CV syllables are and lexical cues to the spoken message. Indeed, slowed by vowel variation, and vice versa [20]; listeners use knowledge of talkers in recognition in vowel choice is affected by pitch or loudness many ways [9, 10, 15], suggesting that phonetic and variation, and vice versa [12]; choice of place of voice processing are highly inter-dependent. articulation is slowed by variation in manner of A classic method for testing inter-dependence articulation, and vice versa [4]; choice between (or independence) of levels of processing is the lexical tones on CV syllables is slowed by either selective attention paradigm, or Garner task [7]. vowel or consonant variation, and vice versa [11, Participants make decisions on (usually: categorize) 18]; choice between two nonsense forms is slowed stimuli that vary in one dimension, while variation by varying stress [16]. on a different dimension is either present or absent – But dependencies were not always symmetrical, for instance, categorizing stimuli as beginning /b/ or indicating that some processes drew on information /d/, given tokens with varying vowels (ba, bi, di) from other signal dimensions; place decisions were versus a constant vowel (ba, ba, da). Variation in affected by manner variation more than vice versa 552 ICPhS XVII Regular Session Hong Kong, 17-21 August 2011 [4], and consonant decisions were affected by pitch symmetrical or not? We predict that voice identity variation more than vice versa [11]. Hard decisions categorization will be harder, and hence show were most susceptible to irrelevant variation, while greater interference, than segment categorization. easier decisions were less affected [3, 4, 18, 19]. Reflecting the origin of the selective attention 2. EXPERIMENT method in the psychophysical tradition [7], these 2.1. Participants experiments, with few exceptions, used synthetic stimuli in which all except the two dimensions of 30 native Dutch-speaking University of Nijmegen interest was held constant. This has clear advantages undergraduates took part in return for a small for interpreting findings, but is difficult to apply to remuneration. (Results of six further participants multidimensional categories or complex dimensions were lost due to equipment malfunction). such as timbre. Talker voice is such a category. 2.2. Stimuli Fortunately, result patterns in studies using natural materials e. g., [16, 18] do not noticeably differ For [1, 2], a quite homogenous set of young male from those in the studies with synthesized stimuli. non-smoking native speakers of Dutch without Mullennix and Pisoni [13] compared phonetic to speech problems or recognizable regional accents voice processing, with naturally spoken real-word (age range: 18-30) had recorded multiple tokens of stimuli. This study was more complex than prior the eight syllables met [mɛt], mes [mɛs], mot [mɒt], studies, involving up to 16 levels per dimension. mos [mɒs], let [lɛt], les [lɛs], lot [lɒt] and los [lɒs]. Participants categorized words (e.g., bad, pad, pill, Four speakers with similar F0 range were selected. or bill) as beginning /b/ or /p/, or they classified the The recordings, which had been sampled at 44100 voice speaking them as of a male or a female talker. Hz, 16 bits per sample, and equalized for average Overall, RTs were faster in the talker decisions amplitude, were truncated to VC syllables by than in phonetic decisions. Mutual dependence excising the initial consonant, using PRAAT. For again appeared, in that variation in either non- two speakers, who were used as target voices, six target dimension affected judgements on the other tokens of each of the VC syllables et, es, ot and os dimension. Crucially, the effects were asymmetric: were chosen, and for the remaining two (non- phonetic decisions were affected more strongly by target) speakers, six tokens each of et, es and os. variation in number of talkers than voice decisions Segment durations were measured for all tokens. by variation in number of words used. Two sets of four experimental blocks were Listeners’ voice processing, however, goes far constructed. One set contained two blocks each of beyond male-female judgement. Knowledge about vowel or voice targets, the other contained two talker identity informs phonetic and lexical each of consonant or voice targets. In each two processing [9, 10, 15]; phonetic category boundaries blocks with the same target type, one block had a are rapidly adjusted to cope with talker-specific constant context (the same syllable, for voice pronunciations [5, 14]. In the present study a classic targets, or the same voice, for segment targets), Garner experimental design is used with a more while the other had a varying context. For vowel challenging, but at the same time natural, voice decisions, the consonant was always /s/, and for identity categorization task, along with a phonetic consonant decisions, the vowel was always /ɛ/. The categorization task that is also more challenging constant syllable for voice decisions was ot, and than syllable-initial stop consonant categorization. the constant voice for segment decisions was Listeners performed 2AFC between (a) two quite “Peter” (see below); the varying context for voice similar male voices, and either (b) two post-vocalic decisions was all four syllables, and the varying alveolar consonants differing in manner, or (c) two context for segment decisions all four voices. preconsonantal central short vowels. The non-target There were 80 trials per block: 16 practice and 64 dimension was either constant, or varied. experimental trials (32 with each target alternative). Although new voices can be easily learned [2], establishing a new category (here, for the two voice 2.3. Procedure identities) will, we predict, always result in longer Participants were seated in a sound-attenuated booth RTs and higher error rates than recognizing a with a monitor and a two-key response box in front known category (here, the segments). The principal of them. They heard the stimuli binaurally over issue concerns variation interference: does it occur Sennheiser headphones. 16 participants heard the in each direction, and if so, are the effects set of four vowel/voice blocks, 14 the 553 ICPhS XVII Regular Session Hong Kong, 17-21 August 2011 consonant/voice blocks. Order of context blocks 2.4.2. RTs (constant,

Inter-Dependent Categorization of Voices and Segments

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support