A History of Vocoder Research at Lincoln Laboratory
Total Page:16
File Type:pdf, Size:1020Kb
B.Gold A History of Vocoder Research at Lincoln Laboratory During the 1950s, Lincoln Laboratory conducted a study of the detection of pitch in speech. This work led to the design ofvoice coders (vocoders)-devices that reduce the bandwidth needed to convey speech. The bandwidth reduction has two benefits: it lowers the cost oftransmission and reception ofspeech, and it increases the potential privacy. This article reviews the past 30 years ofvocoder research with an emphasis on the work done at Lincoln Laboratory. ifI could detennine what there is in the very rapidly basically a transmitter of waveforms. In fact, changingcomplexspeechwavethatcorrespondstothe telephone technology to date has been almost simple motions ojthe Zips and tongue, I could then totally oriented toward transmission while the analyze speechjorthesequantities, I wouldhaveaset ojspeech defining signals that could be handled as subject of speech modeling has remained of low-jrequency telegraph currents with resulting ad peripheral practical interest, in spite ofyears of vantages ojsecrecy, and more telephone channels in research by engineers and scientists from Lin the sameJrequency space as well as a basic under coln Laboratory, Bell Laboratories, and other standingoJthecarriernatureojspeechbywhichthe lip research centers. reader interprets speechjrom simple motions. -Homer Dudley, 1935 Modern methods ofspeech processing really beganinthe UnitedStateswiththe development Historical Background oftwo devices: the voice operated demonstrator (voder) [4] and channel voice coder (vocoder) [51. More than 200 years ago, the Hungartan Both devices were pioneered by Homer Dudley, inventorW. von Kempelen was the first to build whose philosophy ofspeech processing is quot a mechanical talking contrivance that could ed at the beginning of this article. mimic, albeit in a rudimentary fashion, the The appearance of the voder at the 1939 vartoils sounds of human speech [1, 2]. The World's FairinSanFranciscoandNewYorkCity device, constructed around 1780, proved that elicited intense curiosity. The device was con human speech production could be replicated. trolled by a human operator who operated a The next major development in the field was console similar to a piano keyboard (Fig. 1). The undoubtedly the most important: Alexander keys on the keyboard switched the appropriate Graham Bell's invention of the telephone. Al bandpass fIlters into the system; by depressing though there is no need to chronicle the well two or three of the keys and setting the wrist known events that led to the invention, nor any barto thebuzz(Le., voicing) condition, anopera need to discuss its subsequent impact on tor could produce vowel and nasal sounds. If humancommunications, itisinterestingto note the Wlist bar was set to the hiss (Le., voice that Bell's primary profession was that of a less) condition, sounds such as the voiceless speech scientist, Le., someone with knowledge fricatives could be generated. Special keys ofthe intricacies ofthe humanvocal apparatus. were used to produce the plosive and affrica Reference 3, which contains a description of tive sounds (the chin cheese and) injaw), and Bell's Harp telephone, reveals the inventor's a pitch pedalwas used to control the intonation keen understanding of the rudiments of the of the sound. spectral envelope ofspeech. While the voder was the first electronic syn But the telephone that Bell invented was thesizer, the channel vocoder was the first The Lincoln Laboratory Journal. Volume 3, Number 2 (1990) 163 Gold - A History oJVocoder Research at Lincoln Laboratory dard). Thus the channel vocoder was qUickly recognized as a means ofreducing the speech bit rate to some number that could be handled through the average telephone channel. Even tually the military set the standard rate at 2.4 kb/sec. To understand Dudley's conceptofmore tele phone channels in the same frequency space, it is important to realize that human speech pro duction depends on relatively slow changes in the vocal-tract articulators such as the tongue and the lips. Thus, ifwe could develop accurate models of the articulators and estimate the parameters of their motion, we could create an analysis-synthesis system that has a low data rate. Dudley managed to bypass the difficult task of modeling the individual articulators by realizing that he could lump all articulator motion into one time-varying spectral envelope. Furthermore, Dudley understood that, to a first approximation, the excitation produced by vo cal-cord vibrations (for voiced sounds) and re gions ofvocal-tractconstriction (for consonants) could be separated from the actual speech Fig. 1-0riginal sketch of voder by S. W. Watkins (8 Sep tember 1937). sounds that are produced by the motion of the tongue, the lips, and the participation of the nasal passage. Both the spectrum envelope and analysis-synthesis system: that is, the channel excitation parameters must vary slowly (be vocoder derived certain parameters from a cause the articulators themselves vary slowly) speech wave, and the parameters were then and thus transmission ofthese derived parame used to control a synthesizer that reproduced ters requires only several hundred hertz of the speech. bandwidth rather than the usual 3 to 10 kHz To paraphraseDudley, vocoders could lead to needed to transmitspeechvia normal telephony advantages of more secure communications, methods. and a greater number of telephone channels in In an interesting aside, an informative and the same frequency space. Both of Dudley's entertaining paper by W.R. Bennett (6) gives a predictions were correct, but the exact ways in historical survey of the X-System of secret te which they came to pass (or are coming to pass) lephony that high-level Allied personnel used probably differed from what he imagined. extensively dUring World War II. For example, In 1929, digital communications was un during the invasion of Normandy in 1944, the known. When digitization did become feasible, system was used for communications between researchers realized that the process could London and Washington. Details about X-Sys provide a means of making communications tem have only recentlybecome declassified, and less vulnerable to eavesdropping. Digitization, from thatinformationX-Systemappears to have however, required wider transmission band been a sophisticated version of Dudley's chan widths. For example, the current 3-kHz band nel vocoder. X-System included such features width ofa local telephone line cannottransmita as pulse-coded modulation transmission, loga pulse-coded modulation speech signal that is rithmic encoding of the channel signal, and, of coded to 64 kb/sec (the present telephone stan- course, enciphered speech. 164 The Lincoln Laboratory Journal. Volume 3. Number 2 (1990) Gold - A History oJVocoder Research at Lincoln Laboratory Advances in Telephony tain that the growth ofcellular phone traffic will require vocoding techniques. In an article that appeared in National Geo graphic magazine [7], F.B. Colton gives an en grossing account of the history of telephone Work at Lincoln Laboratory transmission. Figure 2, which is from that ar Lincoln Laboratory's history in vocoding ticle, shows the telephone wires on lower Broad commenced in the 1950s with extensive re way in New York City in 1887. It is clear that search in pattern recognition, which led to progress in telephony could easily have been work on detecting the pitch of speech. Some broughtto a haltifnotfor suchimprovementsas of the Laboratory's contributions in vocoding underground cables and multiplexing tech include the niques. At present, fiber optical transmission 1. development ofpitch-detection algorithms and lasers are allowing another great leap for and hardware that have served as stan ward in capacity as telephone traffic continues dards for almost 30 years, to increase. 2. development ofthe first vocoder hardware Such a technological advance brings to mind for satellite and aircraft narrowband the following question: Will optical fibers with speech communications, their enormous bandwidth make vocoding 3. creation of the first real-time computer technology obsolete? The author believes the programs that used linear predictive cod answer is no because of the great potential ing (LPC) and homomorphic vocoding, growth of radio telephones, e.g., cellular 4. use ofvocoders for packet speech commu phones. The permissible bandwidths of such nications, devices are limited by nature; thus it seems cer- 5. use of high-frequency (HF) channels for vocoding, 6. invention of a novel analysis-synthesis system (the Sine Transform Coder), and 7. introductionofmethods thatallowvocoder research and development to be carried out in real time on general-purpose com puters. This article discusses the first, second, third, and seventh contributions, and gives references to the remaining items. The Vocoder Figure 3 is a schematic diagram ofa vocoder. In the left ofthefigure, thevocal-source analyzer finds the correct time-Varying excitation para meters of the input speech (discussed in the following section, "Pitch Detection"). A parame ter is modeled as either a quasi-periodic pulse train or a noise source. The vocal-tract analyzer finds the time-varying shape of the spectral envelope of the input speech by means of an appropriate algorithm (discussed in the subse quent section, "Speech Synthesis"). The excita Fig. 2-Telephone wires on lower Broadway in New York tion parameters and spectral envelope