An Approach to the Implementation of a Discrete Cosine Transform
Total Page:16
File Type:pdf, Size:1020Kb
IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. COM-30, NO. 4, APRIL 1982 635 An Approach to the Implementation of a Discrete Cosine Transform GUIDO BERTOCCI, MEMBER, IEEE, BRIAN W. SCHOENHERR, AND DAVID G. MESSERSCHMITT, SENIOR MEMBER, IEEE Abstract-An approach to the implementation of a di$crete cosine As a ground kule in this study, it was assumed that a single transform (DCT) for application to coding speech is described. The channel was being encoded, as opposed to the encoding of a approach is orientedtoward single speechchannel encoding. In large group of channels simultaneously. The available circuit addition, a detailed computer simulation of an adaptive transform coder is described. techniques and device speeds were assumed to be constrained The purpose of the computer simulationis to determine the internal by those available in MOS-LSI. The DCT was actually imple- precision at various points in the implementation required to avoid mented using SSI and MSI TTL parts using the architecture subjective degradation. Specific recommtnendations are made on the recommended here. required internalprecision inthe implementation of the discrete The proposed implementation architecture is described in cosine transform. 11, A breadboard implementationof the DCT using SSI and MSI TTL Section and the simulation and results are described in Sec- logic based on the results of the computer simulation is reported. tion 111. A brief description of the breadboarded DCT is given in Section IV. I. INTRODUCTION 11. AN APPROACH TO IMPLEMENTATION DAPTIVE transformcoding is, togetherwith subband OF THE DCT CODER A coding,a promising method ofencoding speech at bit It has been recommended that a discrete coside transform ratesbelow 16 kbits/s [l]. A significantobstacle tothe (DCT) [2] is the most appropriate fixed (nonadaptive) trans- widespreadapplication of transform coding is, however, its form for speech signals. It is given by great complexity. A computationally intensive portion of the 4112 N-1 transformcoder is thefront end discrete cosine transform (DCT). In this paper an architecture for the implementation of a DCT is recommended. A detailed computer simulation of a transform coder, including the bit allocation algorithm as well as DCT,was performed for the purpose of determining the required internal finite wordlength precisions. This is a critical problem, since choosing too high a preci- sion complicates the iinplementation, and, insufficient preci- sion will result in degradation of speech quality beyond that where X(rn), 0 < m < N, are the N speech samples in the inherent in the encoding technique. The simulation was care- block, and G,(k) are the transform coefficients: The inverse fully designed to accuratelyreflect these finite precision ef- DCT is given by ananalogous equation. fects. The simulation was run on actual speech followed by Several approaches to the implementation of the DCT were informallistening tests to determinethe effects of insuffi- considered,including the straightforward implementation of cient precision. (2.1) using either an entirely digital or a partially analog ap- This paper does not consider the implementation of the bit proach using switched-capacitortechniques. “Fast DCT” allocation algorithm in a transform coder. The bit allocation algorithms which have been proposed [3] were also considered. method used in the simulator was that recommended in [l]. At a sampling rate of 8 kHz, the multiply rate for a straight- The design of the bit allocation portion of the coder is the forward implementation of (2.1) is only a modest one million most challenging part, particularly from an algorithmic stand- per second for the recommended value of N = 128 [ 13 . Thus, point. the added control complexity of a fast algorithm is obviously not justified for a single channel transform coder (although it Manuscript received August 11, 1981; revised October 20, 1981. would be valuable in a multichannel application). For a single This work was supported in part by the National Science Foundation channelcoder, the switched capacitor techniques were not under Grant 78-16966 and by GTE Lenkurt. The work of G. Bertocci estimated to offer an appreciable die area advantage over an and B. W. Schoenherr was performed in partial fulfillment of the M.S. degree from the University of California, Berkeley, CA. all-digital implementation, but wouldoffer asignificantly G. Bertocci is with Bell Laboratories, Holmdel, NJ 07733. moredifficult design.As a result we settledon a straight- B. W. Schoenherr is with Bell Laboratories, North Andover, MA forward digital implementation of (2.1). 08145. D. G. Messerschmitt is with the Department of Electrical Engineer- There are two possible methods of calculating the N trans- ing and ComputerScience, University of California, Berkeley,CA 94720. form coefficients from N successive speech samples. 0090-6778/82/0400-0635$00.75 0 1982 IEEE 636 IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. COM-30, NO. 4, APRIL 1982 2 w Fig. 1. Block diagram of a per-channel transform coder. 1) Calculate all N transformcoefficients simultaneously, Thischoice results theconfiguration ofFig. 1. It is keeping, at any time, N partial accumulations. As each speech assumed that the speech is first encoded using the p = 255 sample arrives, update each of theN partial accumulations. encodiig law commonlyused in telephone networks. This 2) Store in memory a block of N speech samples as they type of coder is widely available in monolithic forin and has arrive. Simultaneously, calculate the transform coefficients of adequate precision and dynamic range for this apphcation. As the previously stored block. This is done by calculating one the speech samples are read into anN-sample buffer, the adap- transform Coefficient per sampling interval, using all N stored tive feedforwardquantization algorithm(the box labeled speech samples. “adaptive gain”) calculatessome measure of the average The considerations in choosing one of these methods are speech power. This measure is then mapped intoa scale factor as follows. in the box labeled “scale,” which is applied before accumula- 1)Method 2 requires storage locations for 2N speech tion of the transform coefficients and is ais0 transmitted as samples, including those currently being received, aswell as side information. As the samples are read from the N-sample the lastblock on whichthe transform coefficients are cur- buffer, they are converted into a true floating point represen- ‘rentlybeing calculated. Method l requires no storage of tation (which is very similar to p = 255). Prior to accumula- speech samples. tion, we must multiply by.the values of the coshe as in (2.1). 2)Method 1 requires the storageof N partialaccumula- Since the cosine has a well-defined level, it can be stored in tions, while method 2has no partialaccumulations. Both ROM in a fixed point representation. The multiply with the methods require memory for N transform coefficients for stor- speech sample can be performed only on the mantissa of the age while the adaptive quantization side information is being speech sample floatingpoint representation. The resulting calculated. valuesare then normalizedby adding the previously deter- 3) Both methods results in 2N sample periods of delay. In mined scale factor to the exponent, and the resulting value is the case of method 1, the first block is used to calculate the converted to fixed point prior to accumulation. The accumula- transformcoefficients, and the second to calculate the side tion of theN values then determines ,one transform coefficient information. For methsd 2, the firstblock is,used to store the as in (2.1). The accumulation of these N values occurs during N speech samples, and the second is used to calculate both the one speech sampling interval, and the N-sample buffer is read transform coefficients and the side information. N times in order to determine theN transform coefficients. 4) Method 2 is compatible with. adaptivefeedforward As the N transformcoefficients are calculated, they are quantization [l] in which a quantization scaling factor to be storedin an N-sample buffer. Simultaneously, the adaptive sent to the receiver as side information is calculated for the quantization side information is calculated.Then, as the N block of speech samples before. the transform coefficients are transform coefficients are read from the buffer, theyare coded calculated. This adaptive quGtization results in a normaliza- using the appropriate step size and number of bits of precision, tion of the transform coefficients, resulting in a reduction in as determined by the side information. the number of bits of precision in the accumulation and stor- Several aspects ofthe transform coder implementation age of transform coefficients and in the subsequent adaptive deserve to be discussed in more detail. In the following sec- quantization algorithmswhich operate on aper-transform tions we discuss the adaptive gain algorithm, the p = 255 to coefficient (as opposed to block) basis. In method 1, on the floating point conversion, and the generation of the cosine, other hand, the transform coefficients are calculated prior to reception of theentire block of speech samples. Thus the A, p = 255 to Floating Point Conversion accumulation requires greater precision. The conversion of a p = 255 sample to floating point to In view of these considerations, the fourth point was con- expedite a subsequent multiplication has been used in [4] . In sidered overriding, and method 2 was chosen. This choice also this section the details of this conversion are developed. The results in’fewer bits of required memory, inview of the fewer E.i = 255 outputlevel is given by [SI numberof bits ofprecision required for speech samples as compared to partial accumulations and transform coefficients X=iL(I/+16.5 16.5)- (2.2) which have not been adaptively quantized. where X is the output (analog) level corresponding to p = 255 BERTOCCI et ~l.: IMPLEMENTATION OF DISCRETE COSINE TRANSFORM 637 code (L, V), L is the segment number, 0 < L < 8, and Vis the ROM ADDRESS N-4 ? N-I - level on a segment, 0 < V < 16.