Acoustic, Psychoacoustic, and Psychophysiological Measures of Distance in the Finnish Vowel Space
Total Page:16
File Type:pdf, Size:1020Kb
ACOUSTIC, PSYCHOACOUSTIC, AND PSYCHOPHYSIOLOGICAL MEASURES OF DISTANCE IN THE FINNISH VOWEL SPACE Mietta Lennes1, Anne Lehtokoski1, Paavo Alku2, and Risto NŠŠtŠnen1 1. University of Helsinki, Finland 2. University of Turku, Finland ABSTRACT present study deals with an acoustic and psychoacoustic The aim of this study was to compare different vowel description of the vowel space. representations and distance measures by applying them on A reliable perceptual distance measure would be valuable eighteen isolated semi-synthetic Finnish vowel stimuli. For when studying phonetic changes in vowel phones. The vowel spectral distance measures, the FFT power spectra and auditory systems of different languages could be compared through the spectra were calculated. Using different combinations of the dimensions and structure of their vowel spaces. Such a measure vowel stimuli, the amplitudes of the mismatch negativity (MMN) would also support clinical evaluation methods and help develop event-related brain response were measured individually for four and test perceptual models of speech perception. In order to adult subjects to study the psychophysiological distances develop clinically applicable psychophysiological measures of between Finnish vowels. A correlation was found between the sound perception, it is essential to have a reliable acoustic or Euclidean distances of both the FFT power spectra and the even psychoacoustic reference measure with which to compare auditory spectra of the vowels in comparison to the the brain responses. corresponding MMN amplitudes. The results are important for comparative studies of the vowel spaces of different languages. 1.2. Acoustic distance An FFT spectrum equally represents all frequencies that are 1. INTRODUCTION present in the signal. Plomp [7] developed a distance measure for Many studies have concentrated on searching for invariance in stationary complex tones, which calculates the distance between vowel sounds or looked for physical or perceptual correlates for two spectra. phoneme boundaries. However, vowel phones are not absolute nor stable objects. Therefore it is important to find comparative 1.3. Psychoacoustic distance methods for studying the relative structure and dimensions of the Formants (particularly the frequencies of three lowest formants) vowel spaces of different languages and individuals. The present are usually referred to as the acoustic parameters that mainly study compares the basic premises, possibilities and limitations determine the perceived quality of a vowel. The two dimensions of different acoustic and psychoacoustic vowel representations of the plane that can be printed on paper are a probable cause for and applies spectral and psychophysiological distance measures the general need to use two-dimensional vowel representations. to a set of Finnish vowel stimuli. Formants have provided solutions to this problem, e.g., two- or three-dimensional formant charts are widely used to visualize the 1.1. Vowel distance structure of vowel systems [4, 5, 6]. When measuring spectral distances for acoustic vowel sound Iivonen [3] points out that many factors contribute to the signals, one must obtain a numerical representation of each psychoacoustic vowel qualities. Therefore, it may be argued that sound under study. At present, this implies losing the temporal in stead of formant frequency combinations, the overall spectral information of the sound signal: sample spectra of both members pattern of vowels should be taken into account when developing of the vowel pair must be taken at a certain comparable point in a distance measure. time or the spectral properties of each signal averaged over a To obtain psychoacoustic or auditory spectra of vowels, the selected time interval. These spectral representations of vowel FFT power spectra may be filtered according to a model of sound signals may be further processed to obtain peripheral (and central) auditory processes. The Euclidean psychoacoustically weighed spectra. Another option for the distance version of Plomp's [7] formula has been shown to be a purposes of distance measurements would be to extract certain fairly good correlate of perceptual distance [8, 9]. parameters, e.g., formant frequencies from the vowel spectra. When determining the psychoacoustic spectral properties of 1.4. Psychophysiological distance steady-state speech signals, attention has been paid to several The changes in brain activity elicited by a certain sound stimulus principles: taking the ear sensitivity curve into account, can be studied by recording the auditory event-related potentials realization of the physiological frequency scale of hearing in mel (ERP) for the stimulus. The mismatch negativity (MMN) [10] is or Bark scales, using frequency resolution in accordance with the an ERP component, typically peaking at 100Ð200 ms from critical band, and including the frequency level masking effect stimulus onset. It is elicited when an infrequent "deviant" sound [1]. A recent alternative to the Bark scale is the ERB scale [2]. stimulus occurs in a sequence of repetitive "standard" sound Iivonen [3] states that in the description of a listener's stimuli. The amplitude of the MMN response has been shown to psychoacoustic vowel space there are three basic problems: "(1) reflect the perceived frequency difference between "standard" the scale problem, (2) the combination of parameters that define and "deviant" simple tone stimuli [11]. However, in the case of the space, and (3) the question of how sensitive the speech sounds, a language-specific phoneme identification psychoacoustical ability to resolve vowel differences is". The process may also contribute to the MMN [12]. page 2465 ICPhS99 San Francisco 1.5. Problems and hypotheses Vowel F1 F2 F3 This study attempts to visualize and compare the spectral i 320 2263 2770 distances and formant-based representations of a set of Finnish vowel stimuli. It was hypothesized that the spectral distances are y 320 1822 2200 reflected in the corresponding MMN amplitudes. u 320 566 2500 A basic problem in perceptual distance experiments is the fact that the use of the phonetic labels in dissimilarity judgments o 460 820 2468 cannot be distinguished from "purely" psychoacoustic, A 700 1130 2468 nonlinguistic judgment criteria. Fox [13] reported results of e 450 2198 2770 paired-comparison tests indicating that the similarity judgments of vowels are sensitive to subphonemic variations, i.e., the Ï 654 1695 2468 judgments are based at least partially on the acoustic P 450 1350 2540 representations rather than the phonetic labels of the stimuli. Therefore, it makes sense to compare spectral distances with i\y 320 2043 2485 MMN responses, which can be measured without the subjectÕs y\u 320 1194 2350 attention to the stimuli. u\o 410 730 2484 2. METHODS o\A 560 920 2468 2.1. Subjects i\e 385 2231 2770 Four subjects (all normal-hearing speakers of Finnish, right- handed, aged 21Ð23 years; one male) participated in the e\Ï 552 1947 2619 psychophysiological and vowel identification experiments. Two e\P 450 1650 2655 subjects had previous experience of synthetic vowels and all P\o 455 1000 2504 subjects of ERP experiments. i\P 385 1750 2655 2.2. Stimuli P\A 595 1240 2504 Eighteen isolated steady-state vowel sounds were created using a Table 1. The Finnish semi-synthetic vowel stimuli and their new method, Semi-synthetic Speech Generation (SSG), which formant frequencies (Hz). The formant frequencies for the produces synthetic vowels from natural glottal excitation in intermediate sounds (marked with two vowel symbols separated conjunction with an artificial vocal tract model [14]. The glottal with a slash /) were obtained by taking the mean formant excitation was computed from a sustained male voice phonation frequencies of their neighbours. of the Finnish vowel [A]. The utterance was automatically inverse filtered using a method described by Alku [15], hence Condition Standard Deviant 1 Deviant 2 Deviant 3 cancelling the effects of the vocal tract and lip radiation. The resulting glottal excitation was then re-filtered with an eighth- 1 i y y\u u order all-pole filter with coefficients adjusted to shift the 2 u y i\y i frequencies of the three lowest formants (F1, F2, F3) according to the desired vowel qualities. The formant frequencies (see 3 e P P\o o Table 1) were chosen on auditory grounds by two Finnish 4 o P e\P e phoneticians. Eight stimuli were to represent prototypical vowels 5 i e e\Ï Ï [I, y, u, o, A, e, Ï, P] of Finnish and ten stimuli [i\y, y\u, u\o, o\A, i\e, e\Ï, e\P, P\o, i\P, P\A] were intermediates of these 6 Ï e i\e i prototypes. The stimuli were cut to the duration of 400 ms and 7 u o o\A A filtered with a Hanning filter (10 ms r/f times) to avoid clicks. 8 A o u\o u 2.3. Experiments 9 i P P\A A Four Finnish adult subjects were presented with sequences of 10 A P i\P i different combinations of the vowel stimuli using the mismatch negativity experimental paradigm. Each stimulus condition Table 2. Stimulus conditions for the ERP measurements. involved one frequent (standard) stimulus, which was randomly Intermediate vowel stimuli are marked with a slash /. replaced by one of three deviant stimuli. The different stimulus conditions are shown in Table 2. EEG was measured and the 3. RESULTS stimulus sequences presented to the subject while he or she was 3.1. Spectrum analysis reading a book. Measurement sessions were repeated until a FFT power spectra were calculated with a 512-point Hamming minimum of 120 averaged samples was reached for each subject window from the temporal midpoints of all vowel stimuli. and each deviant stimulus. The average MMN amplitudes for the Frequency components from 0 to 11025 Hz were included in 43 deviant stimuli from electrode Cz were measured (of a 50 ms Hz increments. The auditory spectra were obtained by weighting time frame at the largest negative peak between 100 and 250 ms the power spectra by the ear sensitivity function and the from stimulus onset) and separately averaged for each stimulus spreading function.