University Microfilms International 300 N

TIME-DOMAIN INTERPOLATION OF CLIPPED SPEECH AND THE EFFECTS ON LINEAR PREDICTIVE ANALYSIS. Item Type text; Thesis-Reproduction (electronic) Authors League, Barbara Lynn Bunch. Publisher The University of Arizona. Rights Copyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction or presentation (such as public display or performance) of protected items is prohibited except with permission of the author. Download date 30/09/2021 13:40:37 Link to Item http://hdl.handle.net/10150/274825 INFORMATION TO USERS This reproduction was made from a copy of a document sent to us for microfilming. While the most advanced technology has been used to photograph and reproduce this document, the quality of the reproduction is heavily dependent upon the quality of the material submitted. The following explanation of techniques is provided to help clarify markings or notations which may appear on this reproduction. 1.The sign or "target" for pages apparently lacking from the document photographed is "Missing Page(s)". If it was possible to obtain the missing page(s) or section, they are spliced into the film along with adjacent pages. This may have necessitated cutting through an image and duplicating adjacent pages to assure complete continuity. 2. When an image on the film is obliterated with a round black mark, it is an indication of either blurred copy because of movement during exposure, duplicate copy, or copyrighted materials that should not have been filmed. For blurred pages, a good image of the page can be found in the adjacent frame. If copyrighted materials were deleted,- a target note will appear listing the pages in the adjacent frame. 3. When a map, drawing or chart, etc., is part of the material being photographed, a definite method of "sectioning" the material has been followed. It is customary to begin filming at the upper left hand corner of a large sheet and to continue from left to right in equal sections with small overlaps. If necessary, sectioning is continued again—beginning below the first row and continuing on until complete. 4. For illustrations that cannot be satisfactorily reproduced by xerographic means, photographic prints can be purchased at additional cost and inserted into your xerographic copy. These prints are available upon request from the Dissertations Customer Services Department. 5. Some pages in any document may have indistinct print. In all cases the best available copy has been filmed. University Microfilms International 300 N. Zeeb Road Ann Arbor, Ml 48106 1321236 LEAGUE, BARBARA LYNN BUNCH TIME-DOMAIN INTERPOLATION OF CLIPPED SPEECH AND THE EFFECTS ON LINEAR PREDICTIVE ANALYSIS THE UNIVERSITY OF ARIZONA M.S. 1 University Microfilms International 300 N. Zeeb Road, Ann Arbor. MI 48106 TIME-DOMAIN INTERPOLATION OF CLIPPED SPEECH AND THE EFFECTS ON LINEAR PREDICTIVE ANALYSIS by Barbara Lynn Bunch League A Thesis Submitted to the Faculty of the DEPARTMENT OF ELECTRICAL ENGINEERING In Partial Fulfillment of the Requirements For the Degree of MASTER OF SCIENCE In the Graduate College THE UNIVERSITY OF ARIZONA 19 8 3 STATEMENT BY AUTHOR This thesis has been submitted in partial fulfillment of re quirements for an advanced degree at The University of Arizona and is deposited in the University Library to be made available to borrowers under rules of the Library. Brief quotations from this thesis are allowable without special permission, provided that accurate acknowledgment of source is made. Requests for permission for extended quotation from or reproduction of this manuscript in whole or in part may be granted by the head of the major department or the Dean of the Graduate College when in his judg ment the proposed use of the material is in the interests of scholar ship. In all other instances, however, permission must be obtained from the author. SIGNED APPROVAL BY THESIS DIRECTOR This thesis has been approved on the date shown below: \jr-i 1 LA ^ ^ (J£ A.(p..y v ^ \ct G "3 J.1 HH. PARRYPAPPY 0 Date ASS(I(>:i iate Professor of Electrical Engineering TABLE OF CONTENTS Page LIST OF ILLUSTRATIONS iii LIST OF TABLES iv ABSTRACT v CHAPTER 1. INTRODUCTION 1 2. BRIEF REVIEW OF THE LPC METHOD 5 3. DISTORTION MEASURES 9 4. PROCEDURE 14 Log Spectra Measurements 17 Log Area Ratio Measurements 20 5. RESULTS 21 Specific Cases 21 Threshold Levels 33 Distance Measures 34 6. CONCLUSION 41 LIST OF REFERENCES 43 ii LIST OF ILLUSTRATIONS Figure Page 1. Block diagram of a functional model of speech production based on the linear prediction representation of the speech wave 6 2. Basic flow of interpolation routine 15 3. Weighting function 19 4. Speech waveforms for frames 36 and 37 at 20% clipping threshold 22 5. Speech waveforms for frames 40 and 41 at 10% clipping threshold 23 6. Speech waveforms for frames 14 and 15 at 5% clipping threshold 25 7. Speech waveforms for frames 26 and 27 at 20% clipping threshold 26 8. Power spectra for frame 36 at 20% clipping threshold 28 9. Power spectra for frame 40 at 10% clipping threshold 29 10. Power spectra for frame 14 at 5% clipping threshold 31 11. Power spectra for frame 26 at 20% clipping threshold 32 12. Power spectra for frame 25 at 20% clipping threshold 35 13. Interpolated and clipped speech scatter diagram at 10% clipping level 39 iii LIST OF TABLES Table Page 1. Euclidean and log spectra distances for 10 percent clipping level 36 iv ABSTRACT A time-domain interpolation method was explored as a way to enhance speech quality distorted by clipping. Digital speech samples were artificially clipped. A low-order polynomial was used to interpo late the clipped regions. The effects of clipping and interpolation on linear predictive spectrum modeling were studied. A Euclidean distance on log area ratios was performed to measure model differences. A log magnitude distance measure on modeled spectra was used to directly assess spectral distortion. Frequency weighting was performed before applying the spectral distance measure. Informal listening tests were used to evaluate perceptual distortion. Results indicate that for some portions of the speech the interpolation method yielded improvement; for other portions it did not. Overall, signal quality was not enhanced. Results also suggest using a Euclidean distance on log area ratios as an effective and efficiently computed spectral distortion measure. v CHAPTER 1 INTRODUCTION Equipment code size limitations and inadequate control of speaker environments allow speech signals to clip. The effects of clip ping on speech quality depends on the speech processing system and its intended use. Waveform coders and model coders are two systems used in communication. Waveform coders digitally transmit a good reproduction of the actual waveform. Model coders estimate and transmit a linear model of the speech production process rather than the actual waveform, usually with the goal of preserving the power spectral density. Speech coding permits communication at very low bit rates, but waveform coding provides more natural speech and more robustness against speaker varia tions, multiple speakers, and background noise. Since severely clipped speech is highly intelligible [1,2], waveform coders also withstand clipping degradation. Speech coders, on the other hand, do not. When analyzing clipped speech, the severity of the distortion introduced into these systems will depend on the coding process used. The technique of linear predictive coding (LPC) nas been used in many speech coding systems, including pitch excited vocoders [3], voice- excited vocoders [4], adaptive predictive vocoders [5], and adaptive transform coders [6]. In each of these speech coding methods, the speech signal is first divided into frames and a set of LPC parameters is 1 2 extracted from each frame. These extracted parameters are then coded and transmitted as part of the coding procedure. Traditionally, each parameter is coded independently of other parameters in the same frame and adjacent frames. This scalar quantization method achieves a bit rate of 2400 bits per second. At a typical [7] frame rate of 44.4 frames/sec, 54 bits are available,to encode each set of LPC parameters: 41 bits for filter coefficients, 6 bits for pitch, 1 bit for voicing, 5 bits for gain, and one bit for synchronization. Current research has shown that the present bit rate of 2400 for LPC systems can be significantly reduced if vector quantization is used [7,8]. This method utilizes interparametric coupling and source statis tics to achieve the bit reduction. Application of vector quantization can result in an 800 bps speech coder with no change to the LPC design other than the quantization algorithms for the LPC parameters (i.e., filter coefficients, pitch voicing and gain). Furthermore, the method retains acceptable intelligibility and naturalness and smoother transi tions [7]. The key to attaining an 800-bit rate is the reduction from 41 bits to 10 bits for quantizing each frame of filter coefficients. This is achieved by first generating a 10-bit vector quantization codebook for the filter coefficients, and then applying a distortion (or distance) measure between the codebook coefficients and the sampled speech coeffi cients. The codebook coefficients producing the minimum distortion are the transmitted parameters. The vector quantization method is also used in LPC voice recog nition systems [3,9]. Recognition capability is achieved by constructing 3 a template for each vocabulary word. The resulting template set typi fies and spans a large population of speaker word reference templates. A distance score obtained by processing an unknown input utterance and comparing it with the template ensemble identifies the utterance. The LPC quantization methods are dependent on the input of good quality speech. Vector quantization in particular could be susceptible to gross error if the input speech were clipped.

Load more