The Haskins Laboratories' Pulse Code Modulation (PCM) System

Behavior Research Methods, Instruments. & Computers 1990. 22 (6), 550.559 The Haskins Laboratories' pulse code modulation (PCM) system D. H. WHALEN, E. R. WILEY, PHILIP E. RUBIN, and FRANKLIN S. COOPER Haskins Laboratories, New Haven, Connecticut The pulse code modulation (PCM) method of digitizing analog signals has become a standard both in digital audio and in speech research, the focus ofthis paper. The solutions to some problems encountered in earlier systems at Haskins Laboratories are outlined, along with general proper ties of AID conversion. Specialized features of the current Haskins Laboratories system, which has also been installed at more than a dozen other laboratories, are also detailed: the Nyquist filter response, the high-frequency preemphasis filter characteristics, the dynamic range, the tim ing resolution for single- and (synchronized) dual-channel signals, and the form of the digitized speech files (header information, data, and label structure). While the solutions adopted in this system are not intended to be considered a standard, the design principles involved are of in terest to users and creators of other PCM systems. The pulse code modulation (PCM) system of digitiz Hence, the design objective was to store all the stimulus ing analog waveforms, in which amplitude samples are words in the computer, then convert them back to analog taken at frequent, regular intervals, can accurately repre form and bring them out in real time to a listener or, in sent continuously varying signals as binary digital num the usual case, to a dual-track recorder in whatever com bers (see Goodall, 1947). In the years since its introduc bination of stimuli, offsets, and levels the experimenter tion, PCM has become the standard technique for the might choose. digital sampling of analog signals for research purposes The system that resulted is still in use, but its very sin (in preference to such alternatives as delta modulation or gularity makes it mostly of historical interest. Certain predictive coding of various sorts; see Heute, 1988). PCM aspects of that system, however, are incorporated into systems are now available for almost any computer, and newer systems based on current, commercially available the recording industry's digital CDs have surpassed ana hardware. These newer systems are in place at Haskins log formats in sales. Laboratories and at more than a dozen other sites in the Although PCM systems are now commonplace, this has United States and abroad. These will be described in de not always been the case. When Haskins Laboratories tail so that current and future users of Haskins-based sys needed an interactive, multichannel system in the mid tems can have easy reference to them and so that designers 1960s, such systems simply were not available. A design of other systems can see the reasoning that went into the was devised, and implemented in an unconventional way, choices made. The basic principles of AID conversion will to meet the needs of our researchers. Much of our speech be outlined along the way. research at that time was concerned with perceptual responses to different words or syllables arriving at the EARLY PROBLEMS AND SOLUTIONS two ears simultaneously or with small temporal offsets. Stimulus tapes for such experiments could be made by In 1964, when the earliest PCM system at Haskins tape splicing (a separate tape for each ear) and rerecord Laboratories was begun, the challenge for our designers ing the signals onto a dual-track tape, but the method was was simply to create a system where none could be both error-prone and laborious. Moreover, each change bought. Although PCM was common in telephony, there in stimulus condition-different pairings of the overlap were no commercial systems available for programma ping words, differences in relative onset time or in rela ble computers. We therefore designed a system to be in tive level-required doing the whole job over again. terfaced with a Honeywell DDP24 computer (and later on a DDP224) with 8K of memory. Although only brief stretches of speech could be digitized directly into core The writing of this article was supported by NIH Contract NOl-HD memory, double buffering allowed the system to deal with 5-2910 to Haskins Laboratories. We thank Michael D'Angelo, Vincent continuous speech input; that is, the incoming digital Gulisano, Mark Tiede, Ignatius G. Mattingly. Patrick W. Nye, Tom stream was stored alternately in one of two buffer areas Carrell. David B. Pisoni, and two anonymous reviewers for helpful com of memory while earlier samples were being read out from ments. We also thank Leonard Szubowicz for the time and care spent the other buffer and written to digital tape. For output, designing and implementing the original version of the Haskins PCM software. Correspondence should beaddressed to D. H. Whalen, Haskins 2.8 sec of speech could be called up at will directly from Laboratories, 270 Crown St., New Haven. CT 06511. core memory. Longer sequences could be compiled onto Copyright 1990 Psychonomic Society, Inc. 550 HASKINS PCM SYSTEM 551 digital tape and then read off from the tape in near-real not access it. However, the DMAs could, and the pro time. For two-channelsynchronized output, the samples gram was capable of telling them how to do so. In this storedon the tape alternated betweenthe two speechchan way, an adequateamountof memorywas available to sus nels. Later, faster disks became available, so that long, tain a throughput of about 40,000 data samples per sec one-channel sequences could be output without going to ond, divided among up to four channels. tape. The same might have happenedfor the two-ehannel A continuing concern was the synchronization of any output, except that technologypassed this system by, and two PCM channels. This wasaccomplished by settingany it disappeared when the DDP 224 was liquidated for its two channels to wait for the same clock. When the clock gold content in 1982. is started, the two channels beginat exactlythe sametime. The nextchallenge was to meetthe growingdemands of The primary goal of this feature was the easy creation an increasing research field by adding more channels that of stimulus tapesfor dichotic listening procedures (Cooper could access a set of commondisks, avoiding both the re & Mattingly, 1969). It also allowed the simultaneous in cording on digital tape and the limitation to one user at put of two analog channels (e.g., speech and laryngo a time. The result was a multichannel PCM system, which graph). Furthermore, an output and input channel could was designedby Leonard Szubowicz, Rod McGuire, and also be synchronized, allowing for such features as re E. R. Wiley and implemented with the collaboration of sampling a file with different characteristics (e.g., sam Richard Sharkany. It consists (it is still in use) of four pling rate). outputchannelsand two input channels, controllingdirect Sometimes, it is convenient to have an arbitrary audio memory access (DMA) boards and ftlled continuouslyin signal that marks the passage of a certain portion of the first inffirst out (FIFO) circuits. Memory is dynamically main signal. An example is the use of a tone to start a allocated to each active channel; the amount is trimmed clock for a timed response from a subject. Althoughthis back as other requests come in or is expanded as other could be accomplished by havinga second, synchronized channels becomeinactive. The advantageof this memory channel outputtingsuch a ••mark tone," the lack of vari management is that large memory areas make the rare ation in the signal allows for a simpler solution-and one FIFO shutdown (i.e., data did not arrive in time) even that would allow mark tones to accompany two-ehannel rarer. The advantageof FIFO organizationis that buffers output. Each output channel is thus associated with a can be ftlled with less concern for time-critical disk ac mark-tonechannel, whichallowsthe outputof an unvary cesses. A drawback is that the system does not know ex ing audio signal (a I-kHz tone, in this case) without any actly where in the output it is, since only the DMA has increase in processing load. Whenever a sample is out that information, so that the controlling computer cannot put that has the second highest bit set, a I-kHz tone, receivean exactreadingof how far the sequence has gone. 4 msec in duration, is simultaneously outputon the mark Although the speech waveform is the primary signal tone channel. This tone can be recorded along with the of interest at this laboratory, other analog signals, such main signal, allowing (for example) the synchronization as the output of transducers measuring the speech articu of the main signal with other devices (e.g., a reaction lators and muscles (electromyographic [EMG] signals), timer). Since the mark tone is essentially part of the data are also used. Many such signals are more restricted in stream, it does not impose any further load on the sys the frequency domain and, thus, can be represented ade tem: The secondhighestbit is part of the 16-bitword that quately at slower sampling rates. The lower the rate, the is stored in the computer, but not part of the 12 bits of less disk space is used. Even for speech, some purposes data. Thus, mark tones can be freely intermixed with are well served by the 1O-kHz rate, whereas others need either or both channels of synchronized output. the information between5 and 10 kHz, whichis preserved While the PCM system just described is still in use at at the 20-kHz rate. Each of the six channels can be used Haskins Laboratories, it is no longer the only system in at a 10-or 20-kHz sampling rate.

Load more