Abstract Implementation of Low-Bit Rate Audio Codec

Total Page:16

File Type:pdf, Size:1020Kb

Abstract Implementation of Low-Bit Rate Audio Codec ABSTRACT IMPLEMENTATION OF LOW-BIT RATE AUDIO CODEC, CODEC2, IN VERILOG ON MODERN FPGAS by Santhiya Sampath Kumar Audio compression codecs are an important application in the Internet of Things (IoT) where small sensing devices will gather voice signals, but then need to transmit the information to aggregating servers at low cost. In this work, we implement and evaluate a hardware implementation of the Codec2, a lossy speech compression codec in Verilog and map it to an Intel CycloneIV FPGA. We describe the details of our implementation approach, including how we converted the C code of Codec2, how we represent data inside the hardware implementation and the associated cost of this implementation on a real FPGA. We then analyze our implementation compared to a microprocessor implementation to observe what performance we get on an FPGA versus a microprocessor. Our hardware implementation of Codec2 is qualitatively the same in terms of hearing the spoken transmission and has an error rate of 6.55 bits per frame (48 bits) and is 1.73 times faster than the microprocessor implementation. IMPLEMENTATION OF LOW-BIT RATE AUDIO CODEC, CODEC2, IN VERILOG ON MODERN FPGAS A Thesis Submitted to the Faculty of Miami University in partial fulfillment of the requirements for the degree of Master of Science by Santhiya Sampath Kumar Miami University Oxford, Ohio 2020 Advisor: Dr. Peter Jamieson Reader: Dr. Chi-Hao Cheng Reader: Dr. Sahin Gokhan ©2020 Santhiya Sampath Kumar This Thesis titled IMPLEMENTATION OF LOW-BIT RATE AUDIO CODEC, CODEC2, IN VERILOG ON MODERN FPGAS by Santhiya Sampath Kumar has been approved for publication by The College of Engineering and Computing and Department of Electrical and Computer Engineering ____________________________________________________ Dr. Peter Jamieson ______________________________________________________ Dr. Chi-Hao Cheng _______________________________________________________ Dr. Sahin Gokhan Table of Contents List of Tables v List of Figures vi Acknowledgements vii 1 Chapter 1 : Introduction 1 2 Chapter 2 : Background 3 2.1 Audio Compression ...................................................................................................... 3 2.2 Codec2 ......................................................................................................................... 3 2.3 Hardware Implementations of Lossy Audio Compression Codec ................................... 5 3 Chapter 3 : Implementation Details 7 3.1 Number Representation ................................................................................................. 7 3.2 C code to FSM .............................................................................................................. 8 3.3 External and Internal IP Cores for Base Operations ....................................................... 9 3.4 Memory model implementation of the C code in Verilog................................................. 10 3.5 Codec2 Modules ......................................................................................................... 12 4 Chapter 4 : Results 16 4.1 FPGA Utilization ........................................................................................................ 16 4.2 Quality of Implementation .......................................................................................... 17 4.3 Performance Results ................................................................................................... 20 5 Chapter 5 : Discussion on Converting Other Codec2 Configurations 23 6 Chapter 6 : Conclusion 24 References 25 A Finite State Machines of the Top Verilog Modules 27 A.1 codec2 encoder mode2400.v ...................................................................................... 27 A.2 codec2 encoder one frame mode2400.v ..................................................................... 28 A.3 analyse one frame.v .................................................................................................... 30 A.4 speech to uq lsps.v ..................................................................................................... 31 A.5 encode lsp scalar.v ..................................................................................................... 33 A.6 encode WoE.v...................................................................................................................... 34 B Verilog Implementation 35 B.1 codec2 encoder mode2400.v ...................................................................................... 35 B.2 CODEC2 encoder one frame mode2400.v ................................................................. 77 iii B.3 analyse one frame.v.................................................................................................. 116 B.4 speech to uq lsps.v ................................................................................................... 172 B.5 encode lsp scalar.v ................................................................................................... 200 B.6 encode WoE.v.................................................................................................................... 212 iv List of Tables 2.1 Allocation of bits per FRAME ................................................................................ 6 3.1 Cyclone IV EP4CE115F29C7 resource utilization of 32-bit multipliers ......................... 7 3.2 Encoder blocks and states.............................................................................................. 9 3.3 List of IP cores for Base Operations in our 32-bit representation used in the imple- mentation. ..................................................................................................................... 9 3.4 Verilog modules implemented for Codec2 Encoder ..................................................... 12 3.5 Resource utilization of the encoder blocks ................................................................... 14 4.1 Cyclone IV EP4CE115F29C7 resource utilization of Codec2 encoder to process one FRAME ........................................................................................................ 17 4.2 Cyclone IV EP4CE115F29C7 resource utilization of Codec2 encoder to process 150 FRAMES ..................................................................................................... 17 4.3 Cyclone IV EP4CGX150DF31I7AD resource utilization of Codec2 encoder to pro- cess one FRAME ................................................................................................. 17 4.4 Cyclone IV EP4CGX150DF31I7AD resource utilization of Codec2 encoder to pro- cess 150 FRAMES............................................................................................... 18 4.5 Timing Analysis of FPGA C2 Vs RaspberryPi’s ARM processor ................................ 21 v List of Figures 2.1 Digital Voice Radio System ........................................................................................... 4 2.2 Codec2 Encoder Block Diagram ................................................................................... 5 3.1 32-bit Fixed-point representation .................................................................................. 7 3.2 FSM model of the Codec2 encoder for one FRAME (20 ms) ........................................ 8 3.3 FSM model with parallel states ................................................................................... 10 3.4 An example of RAM implementation .......................................................................... 11 3.5 C to FSM conversion .................................................................................................. 13 3.6 Codec2 Verilog module names overlaid in the block diagram ...................................... 15 4.1 Codec2 output of the hts1a.raw processed in C ............................................................ 18 4.2 Codec2 output of the hts1a.raw processed in Verilog ....................................................... 19 4.3 Codec2 output of the hts2a.raw processed in C ............................................................ 19 4.4 Codec2 output of the hts2a.raw processed in Verilog ....................................................... 20 A.1 FSM of codec2 encoder mode2400.v.......................................................................... 27 A.2 FSM of codec2 encoder one frame mode2400.v ........................................................ 28 A.3 FSM of CODEC2 encoder one frame mode2400.v (continued).................................. 29 A.4 FSM of analyse one frame.v ....................................................................................... 30 A.5 FSM of speech to uq lsps.v ........................................................................................ 31 A.6 FSM of speech to uq lsps.v (continued) ..................................................................... 32 A.7 FSM of encode lsp scalar.v ........................................................................................ 33 A.8 FSM of encode WoE.v ........................................................................................................ 34 vi Acknowledgements This thesis is a major milestone in my journey of research and therefore, I am feeling very happy to thank all who have supported me for reaching it. In the first place, I would like to express my heartfelt gratitude to my thesis advisor, Dr. Peter Jamieson for his supervision, advice, guidance from the very early stage of this research and for providing
Recommended publications
  • Digital Speech Processing— Lecture 17
    Digital Speech Processing— Lecture 17 Speech Coding Methods Based on Speech Models 1 Waveform Coding versus Block Processing • Waveform coding – sample-by-sample matching of waveforms – coding quality measured using SNR • Source modeling (block processing) – block processing of signal => vector of outputs every block – overlapped blocks Block 1 Block 2 Block 3 2 Model-Based Speech Coding • we’ve carried waveform coding based on optimizing and maximizing SNR about as far as possible – achieved bit rate reductions on the order of 4:1 (i.e., from 128 Kbps PCM to 32 Kbps ADPCM) at the same time achieving toll quality SNR for telephone-bandwidth speech • to lower bit rate further without reducing speech quality, we need to exploit features of the speech production model, including: – source modeling – spectrum modeling – use of codebook methods for coding efficiency • we also need a new way of comparing performance of different waveform and model-based coding methods – an objective measure, like SNR, isn’t an appropriate measure for model- based coders since they operate on blocks of speech and don’t follow the waveform on a sample-by-sample basis – new subjective measures need to be used that measure user-perceived quality, intelligibility, and robustness to multiple factors 3 Topics Covered in this Lecture • Enhancements for ADPCM Coders – pitch prediction – noise shaping • Analysis-by-Synthesis Speech Coders – multipulse linear prediction coder (MPLPC) – code-excited linear prediction (CELP) • Open-Loop Speech Coders – two-state excitation
    [Show full text]
  • Linux Sound Subsystem Documentation Release 4.13.0-Rc4+
    Linux Sound Subsystem Documentation Release 4.13.0-rc4+ The kernel development community Sep 05, 2017 CONTENTS 1 ALSA Kernel API Documentation 1 1.1 The ALSA Driver API ............................................ 1 1.2 Writing an ALSA Driver ........................................... 89 2 Designs and Implementations 145 2.1 Standard ALSA Control Names ...................................... 145 2.2 ALSA PCM channel-mapping API ..................................... 147 2.3 ALSA Compress-Offload API ........................................ 149 2.4 ALSA PCM Timestamping ......................................... 152 2.5 ALSA Jack Controls ............................................. 155 2.6 Tracepoints in ALSA ............................................ 156 2.7 Proc Files of ALSA Drivers ......................................... 158 2.8 Notes on Power-Saving Mode ....................................... 161 2.9 Notes on Kernel OSS-Emulation ..................................... 161 2.10 OSS Sequencer Emulation on ALSA ................................... 165 3 ALSA SoC Layer 171 3.1 ALSA SoC Layer Overview ......................................... 171 3.2 ASoC Codec Class Driver ......................................... 172 3.3 ASoC Digital Audio Interface (DAI) .................................... 174 3.4 Dynamic Audio Power Management for Portable Devices ...................... 175 3.5 ASoC Platform Driver ............................................ 180 3.6 ASoC Machine Driver ............................................ 181 3.7 Audio Pops
    [Show full text]
  • Mpeg Vbr Slice Layer Model Using Linear Predictive Coding and Generalized Periodic Markov Chains
    MPEG VBR SLICE LAYER MODEL USING LINEAR PREDICTIVE CODING AND GENERALIZED PERIODIC MARKOV CHAINS Michael R. Izquierdo* and Douglas S. Reeves** *Network Hardware Division IBM Corporation Research Triangle Park, NC 27709 [email protected] **Electrical and Computer Engineering North Carolina State University Raleigh, North Carolina 27695 [email protected] ABSTRACT The ATM Network has gained much attention as an effective means to transfer voice, video and data information We present an MPEG slice layer model for VBR over computer networks. ATM provides an excellent vehicle encoded video using Linear Predictive Coding (LPC) and for video transport since it provides low latency with mini- Generalized Periodic Markov Chains. Each slice position mal delay jitter when compared to traditional packet net- within an MPEG frame is modeled using an LPC autoregres- works [11]. As a consequence, there has been much research sive function. The selection of the particular LPC function is in the area of the transmission and multiplexing of com- governed by a Generalized Periodic Markov Chain; one pressed video data streams over ATM. chain is defined for each I, P, and B frame type. The model is Compressed video differs greatly from classical packet sufficiently modular in that sequences which exclude B data sources in that it is inherently quite bursty. This is due to frames can eliminate the corresponding Markov Chain. We both temporal and spatial content variations, bounded by a show that the model matches the pseudo-periodic autocorre- fixed picture display rate. Rate control techniques, such as lation function quite well. We present simulation results of CBR (Constant Bit Rate), were developed in order to reduce an Asynchronous Transfer Mode (ATM) video transmitter the burstiness of a video stream.
    [Show full text]
  • Audio Codecs
    Audio Codecs [ AoIP | Leased Line | E1 ] Release date: July 2019 All rights reserved. Permission to reprint or electronically reproduce any document or graphic in whole or in part for any reason is prohibited unless prior written consent is obtained from AVT Audio Video Technolo- gies GmbH. This catalogue has been put together with the utmost digilence. However, no guar- antee for correctness can be given. AVT Audio Video Technologies GmbH cannot be held responsible for any misleading or incorrect information provided throughout this catalogue. AVT Audio Video Technologies GmbH re- serves the right to change specifications at any time without notice. CONTENT General 5 Features & Symbols 6 Overview 8 ISDN + VoIP ● MAGIC D7 XIP & MAGIC DC7 XIP RM Audio Codecs 10 ○ Application: Audio contribution 12 ISDN + AoIP ● MAGIC AC1 XIP & MAGIC AC1 XIP RM Audio Codecs 14 ○ Application: Audio contribution 16 E1 + AoIP ● MAGIC ACip3 & MAGIC ACip3 2M Audio Codecs 18 ○ Application: Audio contribution 20 ○ Application: AoIP distribution 22 ● MAGIC ACip3 (2M) ModNet System 24 ○ Application: Studio-Transmitter-Links 26 Audio Codec Integration ● MAGIC THipPro ACconnect 28 System Manager Upgrade 30 General Audio Codecs are needed for high-quality bitrate, the desired quality and the accept- Audio transmissions over different networks able delay. The EBU names the following Au- like IP, ISDN, 2-Mbit/s (E1) and X.21. Over IP dio algorithms as mandatory to comply with and ISDN, both Leased Line connections as the AoIP standard. G.711, G.722, ISO/MPEG well as temporary dial-up connections can Layer 2 and PCM (for stationary Audio Co- be used.
    [Show full text]
  • Centauri II Multichannel Audio Gateway Codec – a New Generation Conquers the Control-Room
    Centauri II Multichannel Audio Gateway Codec conquers the Control-room.– a new generation New! D 6ms Latency D 5.1 / 7.1 Multichannel D Front-panel Hot Keys D Gateway Function D Backup Function D Twin/Quad Codec D ASI Most Audio-Codecs are specialists. The CENTAURI II simply enables you to do everything. An unbeatable range of features makes the CENTAURI II simpler, safer and more cost-effective to use than any other codec. The CENTAURI II is your universal Audio cover the entire range currently in general Considering the extensive system support Codec for every imaginable project. use. Including MPEG, AES Transparent it is clear that the CENTAURI II is an and APT – simultaneously! audio codec for all situations. Whether for There are no networks that can stop a By other manufacturers this would still be Broadcasting, for DVB-H or UMTS trans- CENTAURI II, whether ISDN or Ethernet, a legitimate question but by MAYAH this missions, to name but a few. has long been possible. X.21 or E1. There are no protocols that In light of so much technical sophistica- the CENTAURI II cannot understand. This Combinations of its many and versatile tion, it’s hardly surprising to learn that codec can be simply and easily integrated features permit a wide range of applica- the CENTAURI II is also the first audio into every imaginable IT infrastructure. tions; from Gateway, Backup Codec or codec to offer professional 5.1/7.1 multi- And its more than 15 coding algorithms Streaming-Server to Multichannel Codec. channel transmissions.
    [Show full text]
  • Low Bit-Rate Speech Coding with Vq-Vae and a Wavenet Decoder
    ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 735-739. IEEE, 2019. DOI: 10.1109/ICASSP.2019.8683277. c 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. LOW BIT-RATE SPEECH CODING WITH VQ-VAE AND A WAVENET DECODER Cristina Garbaceaˆ 1,Aaron¨ van den Oord2, Yazhe Li2, Felicia S C Lim3, Alejandro Luebs3, Oriol Vinyals2, Thomas C Walters2 1University of Michigan, Ann Arbor, USA 2DeepMind, London, UK 3Google, San Francisco, USA ABSTRACT compute the true information rate of speech to be less than In order to efficiently transmit and store speech signals, 100 bps, yet current systems typically require a rate roughly speech codecs create a minimally redundant representation two orders of magnitude higher than this to produce good of the input signal which is then decoded at the receiver quality speech, suggesting that there is significant room for with the best possible perceptual quality. In this work we improvement in speech coding. demonstrate that a neural network architecture based on VQ- The WaveNet [8] text-to-speech model shows the power VAE with a WaveNet decoder can be used to perform very of learning from raw data to generate speech. Kleijn et al. [9] low bit-rate speech coding with high reconstruction qual- use a learned WaveNet decoder to produce audio comparable ity.
    [Show full text]
  • UMTS); LTE; Performance Characterization of the Adaptive Multi-Rate Wideband (AMR-WB) Speech Codec (3GPP TR 26.976 Version 10.0.0 Release 10)
    ETSI TR 126 976 V10.0.0 (2011-04) Technical Report Digital cellular telecommunications system (Phase 2+); Universal Mobile Telecommunications System (UMTS); LTE; Performance characterization of the Adaptive Multi-Rate Wideband (AMR-WB) speech codec (3GPP TR 26.976 version 10.0.0 Release 10) 3GPP TR 26.976 version 10.0.0 Release 10 1 ETSI TR 126 976 V10.0.0 (2011-04) Reference RTR/TSGS-0426976va00 Keywords GSM, LTE, UMTS ETSI 650 Route des Lucioles F-06921 Sophia Antipolis Cedex - FRANCE Tel.: +33 4 92 94 42 00 Fax: +33 4 93 65 47 16 Siret N° 348 623 562 00017 - NAF 742 C Association à but non lucratif enregistrée à la Sous-Préfecture de Grasse (06) N° 7803/88 Important notice Individual copies of the present document can be downloaded from: http://www.etsi.org The present document may be made available in more than one electronic version or in print. In any case of existing or perceived difference in contents between such versions, the reference version is the Portable Document Format (PDF). In case of dispute, the reference shall be the printing on ETSI printers of the PDF version kept on a specific network drive within ETSI Secretariat. Users of the present document should be aware that the document may be subject to revision or change of status. Information on the current status of this and other ETSI documents is available at http://portal.etsi.org/tb/status/status.asp If you find errors in the present document, please send your comment to one of the following services: http://portal.etsi.org/chaircor/ETSI_support.asp Copyright Notification No part may be reproduced except as authorized by written permission.
    [Show full text]
  • Speech Compression
    information Review Speech Compression Jerry D. Gibson Department of Electrical and Computer Engineering, University of California, Santa Barbara, CA 93118, USA; [email protected]; Tel.: +1-805-893-6187 Academic Editor: Khalid Sayood Received: 22 April 2016; Accepted: 30 May 2016; Published: 3 June 2016 Abstract: Speech compression is a key technology underlying digital cellular communications, VoIP, voicemail, and voice response systems. We trace the evolution of speech coding based on the linear prediction model, highlight the key milestones in speech coding, and outline the structures of the most important speech coding standards. Current challenges, future research directions, fundamental limits on performance, and the critical open problem of speech coding for emergency first responders are all discussed. Keywords: speech coding; voice coding; speech coding standards; speech coding performance; linear prediction of speech 1. Introduction Speech coding is a critical technology for digital cellular communications, voice over Internet protocol (VoIP), voice response applications, and videoconferencing systems. In this paper, we present an abridged history of speech compression, a development of the dominant speech compression techniques, and a discussion of selected speech coding standards and their performance. We also discuss the future evolution of speech compression and speech compression research. We specifically develop the connection between rate distortion theory and speech compression, including rate distortion bounds for speech codecs. We use the terms speech compression, speech coding, and voice coding interchangeably in this paper. The voice signal contains not only what is said but also the vocal and aural characteristics of the speaker. As a consequence, it is usually desired to reproduce the voice signal, since we are interested in not only knowing what was said, but also in being able to identify the speaker.
    [Show full text]
  • Audio Codec (1)
    Audio Codec (1) Sampling Number of Codec bit rate Description frequency channels General (medium to high bit rate) 640 kbps (max.) Belonging to Dolby Digital, 448 kbps (DVD, supporting multi-channel AC-3 Digital cable TV) -Multiaudio, used on DVD 384 kbps (ATSC) Pulse-code modulation, digital representation of an analogue signal by sampling the Varied magnitude of the signal at PCM -Up to 8 64 kbps (DS0) uniform intervals, used in digital telephone systems and digital audio in computers and CDs AAC - 8 – 96 kHz - Advanced Audio Coding, Adaptive Transform Acoustic 48, 64, 66, 132, 256 Coding, developed by Sony, ATRAC -- kbps used to store information on Minidisc, Digital Theatre System, used 768 – 1536 kbps (6- DTS -Multifor in-movie sound on film and channel) on DVD MP1 384 kbps Varied 1, 2 Lowest encoder complexity 256 – 384 kbps More complex encoder and (excellent) decoder, able to remove more 224 – 256 kbps (very of the signal redundancy and MP2 Varied 1, 2 good) to apply the psychoacoustic 192 – 224 kbps threshold more efficiently (good) 224 – 320 kbps More complex, directed (excellent) towards lower bit rate 32, 41.1, 48 MP3 192 – 224 kbps (very 1, 2 applications kHz good) 128 – 192 (good) Known as MPC, MPEGplus, Musepack 160 – 180 kbps - 2 MPEG+ or MP+, a derivative of MP2 Constant bitrate at Developed by Nippon TwinVQ 80, 96, 112, 128, --Telegraph and Telephone 160, 192 kbps Corporation Open and free codec project Vorbis 45 – 500 kbps - - from the Xiph.org Foundation Constant and Developed by Microsoft WMA variable bit rate -Multi
    [Show full text]
  • Ambisonic Coding with Spatial Image Correction Pierre Mahé, Stéphane Ragot, Sylvain Marchand, Jérôme Daniel
    Ambisonic Coding with Spatial Image Correction Pierre Mahé, Stéphane Ragot, Sylvain Marchand, Jérôme Daniel To cite this version: Pierre Mahé, Stéphane Ragot, Sylvain Marchand, Jérôme Daniel. Ambisonic Coding with Spatial Image Correction. European Signal Processing Conference (EUSIPCO) 2020, Jan 2021, Amsterdam (virtual ), Netherlands. hal-03042322 HAL Id: hal-03042322 https://hal.archives-ouvertes.fr/hal-03042322 Submitted on 6 Dec 2020 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Ambisonic Coding with Spatial Image Correction Pierre MAHE´ 1;2 Stephane´ RAGOT1 Sylvain MARCHAND2 Jer´ omeˆ DANIEL1 1Orange Labs, Lannion, France 2L3i, Universite´ de La Rochelle, France [email protected], [email protected], [email protected], [email protected] Abstract—We present a new method to enhance multi-mono re-creates the spatial scene based on a mono downmix and coding of ambisonic audio signals. In multi-mono coding, each these spatial parameters. This method has been extended to component is represented independently by a mono core codec, High-Order Ambisonics (HOA) in HO-DirAC [9] where the this may introduce strong spatial artifacts. The proposed method is based on the correction of spatial images derived from the sound field is divided into angular sectors; for each angular sound-field power map of original and degraded ambisonic sig- sector, one source is extracted.
    [Show full text]
  • Input Formats & Codecs
    Input Formats & Codecs Pivotshare offers upload support to over 99.9% of codecs and container formats. Please note that video container formats are independent codec support. Input Video Container Formats (Independent of codec) 3GP/3GP2 ASF (Windows Media) AVI DNxHD (SMPTE VC-3) DV video Flash Video Matroska MOV (Quicktime) MP4 MPEG-2 TS, MPEG-2 PS, MPEG-1 Ogg PCM VOB (Video Object) WebM Many more... Unsupported Video Codecs Apple Intermediate ProRes 4444 (ProRes 422 Supported) HDV 720p60 Go2Meeting3 (G2M3) Go2Meeting4 (G2M4) ER AAC LD (Error Resiliant, Low-Delay variant of AAC) REDCODE Supported Video Codecs 3ivx 4X Movie Alaris VideoGramPiX Alparysoft lossless codec American Laser Games MM Video AMV Video Apple QuickDraw ASUS V1 ASUS V2 ATI VCR-2 ATI VCR1 Auravision AURA Auravision Aura 2 Autodesk Animator Flic video Autodesk RLE Avid Meridien Uncompressed AVImszh AVIzlib AVS (Audio Video Standard) video Beam Software VB Bethesda VID video Bink video Blackmagic 10-bit Broadway MPEG Capture Codec Brooktree 411 codec Brute Force & Ignorance CamStudio Camtasia Screen Codec Canopus HQ Codec Canopus Lossless Codec CD Graphics video Chinese AVS video (AVS1-P2, JiZhun profile) Cinepak Cirrus Logic AccuPak Creative Labs Video Blaster Webcam Creative YUV (CYUV) Delphine Software International CIN video Deluxe Paint Animation DivX ;-) (MPEG-4) DNxHD (VC3) DV (Digital Video) Feeble Files/ScummVM DXA FFmpeg video codec #1 Flash Screen Video Flash Video (FLV) / Sorenson Spark / Sorenson H.263 Forward Uncompressed Video Codec fox motion video FRAPS:
    [Show full text]
  • Advanced Speech Compression VIA Voice Excited Linear Predictive Coding Using Discrete Cosine Transform (DCT)
    International Journal of Innovative Technology and Exploring Engineering (IJITEE) ISSN: 2278-3075, Volume-2 Issue-3, February 2013 Advanced Speech Compression VIA Voice Excited Linear Predictive Coding using Discrete Cosine Transform (DCT) Nikhil Sharma, Niharika Mehta Abstract: One of the most powerful speech analysis techniques LPC makes coding at low bit rates possible. For LPC-10, the is the method of linear predictive analysis. This method has bit rate is about 2.4 kbps. Even though this method results in become the predominant technique for representing speech for an artificial sounding speech, it is intelligible. This method low bit rate transmission or storage. The importance of this has found extensive use in military applications, where a method lies both in its ability to provide extremely accurate high quality speech is not as important as a low bit rate to estimates of the speech parameters and in its relative speed of computation. The basic idea behind linear predictive analysis is allow for heavy encryptions of secret data. However, since a that the speech sample can be approximated as a linear high quality sounding speech is required in the commercial combination of past samples. The linear predictor model provides market, engineers are faced with using other techniques that a robust, reliable and accurate method for estimating parameters normally use higher bit rates and result in higher quality that characterize the linear, time varying system. In this project, output. In LPC-10 vocal tract is represented as a time- we implement a voice excited LPC vocoder for low bit rate speech varying filter and speech is windowed about every 30ms.
    [Show full text]