Intelligibility of Frequency-Lowered Speech Produced by a Channel Vocoder
Total Page:16
File Type:pdf, Size:1020Kb
Department of Veterans Affairs Journal of Rehabilitation Research and Development Vol . 30 No . 1, 1993 Pages 26—38 Intelligibility of frequency-lowered speech produced by a channel vocoder Miles P . Posen, MS ; Charlotte M . Reed, PhD; Louis D . Braida, PhD Research Laboratory of Electronics, Massachusetts Institute of Technology, Cambridge, MA 02139 Abstract—Frequency lowering is a form of signal process- INTRODUCTION ing designed to match speech to the residual auditory capacity of a listener with a high frequency hearing loss. A vocoder-based frequency-lowering system similar to One general class of approaches toward lower- one studied by Lippmann was evaluated in the present ing of the speech spectrum for individuals with high study. In this system, speech levels in high frequency frequency hearing impairment is based on detecting bands modulated one-third-octave bands of noise at low high frequency information and recoding it through frequencies, which were then added to unprocessed the use of low frequency signals which take advan- speech. Results obtained with this system indicated, in tage of the residual hearing of the listener . Two agreement with Lippmann, that processing improved the methods that have been employed to achieve such recognition of stop, fricative, and affricate consonants signal processing include frequency transposition when the listening bandwidth was restricted to 800 Hz. and channel vocoding . In schemes employing fre- However, results also showed that processing degraded quency transposition (1,2), information in a speci- the perception of nasals and semivowels, consonants not fied high frequency band is shifted downward using included in Lippmann's study . Based on these results, the frequency-lowering system was modified so as to suppress amplitude modulation or nonlinear distortion . In the processing whenever low frequency components dom- schemes employing channel vocoding (3,4,5,6), the inated the input signal. High and low frequency energies high frequency speech content is analyzed by a bank of an input signal were measured continuously in the of filters whose output envelopes are used to control modified system, and the decision to process or to leave the amplitude of signals from low frequency synthe- the signal unaltered was based on their relative levels. sis filters. Braida, et al . (7) reviewed studies of Results indicated that the modified system maintained the transposition or vocoding prior to 1979 and con- processing advantage for stops, fricatives, and affricates cluded that, in general, the benefits to speech without degrading the perception of nasals and semi- reception with these lowering schemes were generally vowels. The results of the present study also indicated small and restricted to a narrow class of speech that training is an important consideration when evaluat- sounds. Among the reasons offered by Braida, et al. ing frequency-lowering systems. for this lack of success were inappropriate selection of both the frequency range used to analyze high Key words: frequency-lowering systems, speech intelligi- frequency information and the form of the recoded bility, vocoder evaluation. signals, as well as the choice of the level of the recoded signals relative to the normal low frequency speech components . For example, in many of the vocoder systems evaluated in the past, the analysis Address all correspondence and requests for reprints to : Charlotte M. Reed, Room 36-751, Massachusetts Institute of Technology, 77 Com- filter outputs controlled the levels of low frequency monwealth Avenue, Cambridge, MA 02139 . sinewaves, which were the sole signal presented to 26 27 Section 1 . Digital Techniques in Acoustic Amplification : Rosen at al. the listener (3,5,6) . Not only do such systems fail to for further study . Specifically, this system consisted distinguish between voiced and unvoiced sounds, of four analysis bands (two of which were two-third- they also eliminate suprasegmental speech cues octave wide and two of which were one-octave wide) available in the low frequency speech range . Several and four synthesis bands composed of four one- recent studies employing either transposition or third-octave bands of noise whose center frequencies vocoding, however, have reported improved identifi- ranged from 400 to 800 Hz . Consonant identifica- cation of fricative and affricate sounds over that tion tests using CVC syllables composed from 16 obtained through low-pass filtering (8,9,10,11). consonants (C) and 6 vowels (V) were conducted on Velmans (9) described a transposer-based sys- 7 subjects with normal hearing for both frequency- tem in which high frequency information in the lowered speech and linearly amplified speech with range of 4,000-8,000 Hz was shifted downward into an 800-Hz bandwidth . Overall percent-correct iden- the range of 0-4,000 Hz using a balanced modula- tification increased by 10 percentage points from 36 tor, and then combined with the linearly amplified percent correct with linear amplification to 46 signal. In evaluations of consonant reception by percent with frequency lowering . For individual listeners with normal-hearing, conducted with or subjects, increases ranged from 4 to 14 percentage without frequency transposition, the speech signal points. was low-pass filtered to 900 Hz and combined with Several factors may underlie the positive results high-pass white noise . Consonant identification was observed by Velmans and Lippmann relative to superior for the transposer over low-pass filtering, those reported in other studies of transposition and both alone and in conjunction with speechreading vocoding for frequency lowering (7) . First, in both (by roughly 13 percentage points without speech- studies the recoded signals were presented at levels reading and 8 percentage points when speechreading that did not significantly interfere with normal low was also available) . Velmans and Marcuson (10) frequency speech sounds. Second, the frequency presented data which indicated that the device range selected to analyze high frequency informa- proved beneficial to impaired listeners whose resid- tion and the specific form of the recoded low ual hearing extended to 1,000 Hz (or beyond) but frequency signals may have contributed to improved who had little or no residual hearing above 4,000 performance. Hz. These listeners showed improvements in trans- The current study was concerned with extending posed consonant identification ranging from 11 to the investigation of the effects of vocoder-based 30 percent, although identification of only those frequency-lowered systems on the reception of con- consonants with strong high frequency speech cues sonants . In Experiment 1, the performance of (/s, j ,z, 3 , tf ,d3,t/) was investigated . Further re- listeners with normal hearing was evaluated using a sults obtained with a group of 25 students with system modeled after that described by Lippmann hearing impairment in schools for the deaf (11) (8) . In Experiment 2, a modified system designed to indicated that benefits for the transposition device improve upon results obtained in Experiment 1, was were larger in children with severe to profound high evaluated. The results of the current study are frequency losses than in those with mild to moderate compared with those of other relevant studies, and losses in this range . Again, evaluations were re- theoretical predictions are presented for perfor- stricted to consonants with strong high frequency mance through the frequency-lowered system in content. combination with speechreading. In a vocoding system developed by Lippmann (8), speech sounds in the 1,000-8,000 Hz range were analyzed with a bank of band-pass filters whose EXPERIMENT 1 : INITIAL EVALUATION OF A outputs controlled the levels of low frequency bands VOCODER-BASED FREQUENCY-LOWERING of noise in the 400-800 Hz range. The noise signals SYSTEM were added to the original speech signal, which was low-pass filtered to 800 Hz. Preliminary discrimina- Method tion tests conducted with a variety of choices for the System Description. A block diagram of the analysis and synthesis filters led to the selection of a system used in Experiment 1, modeled after that of system with our analysis and four synthesis bands Lippmann (8), is shown in Figure 1 . This system 28 Journal of Rehabilitation Research and Development Vol . 30 No . 1 1993 FREQUENCY LOWERING SYSTEM INPUT OUTP LOW- PASS FILTER ANALYSIS FILTERS COMPUTER CONTROLLED AMPLIFIERS NOISE SYNTHESIS GBy FILTERS I I I I I LEVEL DETECTORS I►,, I I I i , ATTENTUATION ♦ V ANALOG CONTROL DATA ACQUISITION LSI II MINI-COMPUTER Figure 1. Block diagram of the vocoder-based frequency-lowering system employed in Experiment 1. made use of a computer-controlled multiband bands to 0.14 msec for the highest-frequency band. speech processor (12) . High frequency speech infor- The sample values were used to determine the mation was analyzed by first passing speech through output levels of low frequency narrow-band noise a bank of eight contiguous one-third-octave filters signals, which were summed and added to the (General Radio 1925 Multifilter) with standard original speech signal . These noise-band signals were center frequencies in the range of 1 .0 to 5 .0 kHz generated by passing wide-band noise through four (1 .0, 1 .25, 1 .6, 2 .0, 2 .5, 3 .15, 4.0, 5 .0) . The outputs contiguous one-third-octave filters (GR 1925) whose of adjacent filters were combined using analog center frequencies ranged from 400 to 800 Hz . The summing amplifiers to form four analysis bands high frequency analysis bands and the low frequency with rejection rates of 48 dB/octave . The output synthesis filters were monotonically related in that levels of these bands were measured using rms level the lowest analysis channel controlled the lowest detectors that had logarithmic conversion, averaging synthesis band, the second-lowest analysis channel times of 20 ms, and dynamic ranges of 65-70 dB. controlled the second-lowest synthesis band, and so These levels were made available to an LSI-11 on .