Note to Users

Total Page:16

File Type:pdf, Size:1020Kb

Note to Users NOTE TO USERS This reproduction is the best copy available. ® UMI Lapped Transforms in Perceptual Coding of Wideband Audio Sien Ruan Department of Electrical & Computer Engineering McGill University Montreal, Canada December 2004 A thesis submitted to McGill University in partial fulfillment of the requircments for the degree of Master of Engineering. © 2004 Sien Ruan Library and Bibliothèque et 1+1 Archives Canada Archives Canada Published Heritage Direction du Branch Patrimoine de l'édition 395 Wellington Street 395, rue Wellington Ottawa ON K1A ON4 Ottawa ON K1A ON4 Canada Canada Your file Votre référence ISBN: 0-494-12642-6 Our file Notre référence ISBN: 0-494-12642-6 NOTICE: AVIS: The author has granted a non­ L'auteur a accordé une licence non exclusive exclusive license allowing Library permettant à la Bibliothèque et Archives and Archives Canada to reproduce, Canada de reproduire, publier, archiver, publish, archive, preserve, conserve, sauvegarder, conserver, transmettre au public communicate to the public by par télécommunication ou par l'Internet, prêter, telecommunication or on the Internet, distribuer et vendre des thèses partout dans loan, distribute and sell th es es le monde, à des fins commerciales ou autres, worldwide, for commercial or non­ sur support microforme, papier, électronique commercial purposes, in microform, et/ou autres formats. paper, electronic and/or any other formats. The author retains copyright L'auteur conserve la propriété du droit d'auteur ownership and moral rights in et des droits moraux qui protège cette thèse. this thesis. Neither the thesis Ni la thèse ni des extraits substantiels de nor substantial extracts from it celle-ci ne doivent être imprimés ou autrement may be printed or otherwise reproduits sans son autorisation. reproduced without the author's permission. ln compliance with the Canadian Conformément à la loi canadienne Privacy Act some supporting sur la protection de la vie privée, forms may have been removed quelques formulaires secondaires from this thesis. ont été enlevés de cette thèse. While these forms may be included Bien que ces formulaires in the document page count, aient inclus dans la pagination, their removal does not represent il n'y aura aucun contenu manquant. any loss of content from the thesis. ••• Canada Ta my belaved parents 11 Abstract Audio coding paradigms depend on time-frequency transformations to remove statistical redundancy in audio signaIs and reduce data bit rate, while maintaining high fidelity of the reconstructed signal. Sophisticated perceptual audio coding further exploits perceptual redundancy in audio signaIs by incorporating perceptual masking phenomena. This thesis focuses on the investigation of different coding transformations that cau be used to compute perceptual distortion measures effectively; among them the lapped transform, which is most widely used in nowadays audio coders. Moreover, an innovative lapped transfonn is developed that can vary overlap percentage at arbitrary degrees. The new lapped transform is applicable on the transient audio by capturing the time-varying characteristics of the signal. III Sommaire Les paradigmes de codage audio dépendent des transformations de temps-fréquence pour enlever la redondance statistique dans les signaux audio et pour réduire le taux de trans­ mission de données, tout en maintenant la fidélité élevée du signal reconstruit. Le codage sophistiqué perceptuel de l'audio exploite davantage la redondance perceptuelle dans les signaux audio en incorporant des phénomènes de masquage perceptuels. Cette thèse se concentre sur la recherche sur les différentes transformations de codage qui peuvent être employées pour calculer des mesures de déformation perceptuelles efficacement, parmi elles, la transformation enroulé, qui est la plus largement répandue dans les codeurs audio de nos jours. D'ailleurs, on développe une transformation enroulée innovatrice qui peut changer le pourcentage de chevauchement à des degrés arbitraires. La nouvelle transformation en­ roulée est applicable avec l'acoustique passagère en capturant les caractéristiques variantes avec le temps du signal. IV Acknowledgments 1 would like to acknowledge my supervisor, Prof. Peter Kabal, for his support and guidance throughout my graduate studies at McGill University. Prof. Kabal's kind treatment to his students is highly appreciated. 1 would also like to thank Ricky Der for working with me and advising me through the work. My thanks go to my fellow TSP graduate students for their close friendship; especially Alexander M. Wyglinski for the various technical assistances. 1 am sincerely indebted to my parents for all the encouragement they have given to me. They are the reason for who 1 am today. To my mother, Mrs. Dejun Zhao and my father, ML Liwu Ruan, thank you. v Contents 1 Introduction 1 1.1 Audio Coding Techniques 1 1.1.1 Parametric Coders 1 1.1.2 Waveform Coders . 2 1.2 Time-to-Frequency Transformations. 3 1.3 Thesis Contributions 4 1.4 Thesis Synopsis . 4 2 Perceptual Audio Coding: Psychoacoustic Audio Compression 6 2.1 Human Auditory Masking ... 6 2.1.1 Hearing System . 7 2.1.2 Perception of Loudness . 7 2.1.3 Critical Bands ..... 1:\ 2.1.4 Masking Phenomena .. 10 2.2 Example Perceptual Model: Johnston's Model 11 2.2.1 Loudness Normalization .... 11 2.2.2 Masking Threshold Calculation 11 2.2.3 Perceptual Entropy . 13 2.3 Perceptual Audio Coder Structure. 14 2.3.1 Time-to-Frequency Transformation 15 2.3.2 Psychoacoustic Analysis 17 2.3.3 Adaptive Bit Allocation 17 2.3.4 Quantization ..... 18 2.3.5 Bitstream Formatting 20 Contents vi 3 Signal Decomposition with Lapped Transforms 21 3.1 Block Transforms 22 3.2 Lapped Transforms 22 3.2.1 LT Orthogonal Constraints. 23 3.3 Filter Banks: Subband Signal Processing 26 3.3.1 Perfect Reconstruction Conditions. 27 3.3.2 Filter Bank Representation of the LT 28 3.4 Modulated Lapped Transforms . 28 3.4.1 Perfect Reconstruction Conditions. 28 3.5 Adaptive Filter Banks . 33 3.5.1 Window Switching with Perfect Reconstruction 33 4 MP3 and AAC Filter Banks 35 4.1 Time-to-Frequency Transformations of MP3 and AAC . 35 4.1.1 MP3 Transformation: Hybrid Filter Bank 35 4.1.2 AAC Transformation: Pure MDCT Filter Bank 43 4.2 Performance Evaluation .. 44 4.2.1 Full Coder Description ... 44 4.2.2 Audio Quality Measurements 49 4.2.3 Experiment Results ..... 50 4.3 Psychoacoustic Transforms of DFT and MDCT 52 4.3.1 Inherent Mismatch Problem 52 4.3.2 Experiment Results ... 54 5 Partially Overlapped Lapped Transforms 55 5.1 Motivation of Partially Overlapped LT: NMR Distortion 55 5.2 Construction of Partially Overlapped LT .. 56 5.2.1 MLT as DST via Pre- and Post-Filtering 56 5.2.2 Smaller Overlap Solution 60 5.3 Performance Evaluation ..... 62 5.3.1 Pre-echo Mitigation . 62 5.3.2 Optimal Overlapping Point for Transient Audio 65 Contents VIl 6 Conclusion 66 6.1 Thesis Summary ..... 66 6.2 Future Research Directions. 68 A Greedy Aigorithm and Entropy Computation 70 A.1 Greedy Algorithm. 70 A.2 Entropy Computation 71 VIll List of Figures 2.1 Absolute threshold of hearing for normallisteners. 8 2.2 Generic perceptual audio encoder 14 2.3 Sine MDCT-window (576 points). 16 3.1 General signal processing system using the lapped transform .. 23 3.2 Signal processing with a lapped transform with L = Uv!. 24 3.3 Typical subband processing system, using the filter bank. 26 3.4 Magnitude frequency response of a MLT (M = 10). 29 4.1 MPEG-1 Layer III decomposition structure. 36 4.2 Layer III prototype filter (b) and the original window (a). 37 4.3 Magnitude response of the lowpass filter. 38 4.4 Magnitude response of the polyphase filter bank (f\.1 = 32). 38 4.5 Switching from a long sine window to a short one via a start window. 41 4.6 Layer III aliasing-butterfiy, encoderjdecoder. 41 4.7 Layer III aliasing reduction encoderjdecoder diagram. 42 4.8 Block diagram of the encoder of the full audio coder. 45 4.9 Frequency response of the MD CT basis function hk(n), f\.1 = 4 .. 53 5.1 Flowgraph of the Modified Discrete Cosine Transform. ..... 57 5.2 Flowgraph of MDCT as block DST via butterfiy pre-filtering. 58 5.3 Global viewpoint of MDCT as pre-filtering at DST block boundaries. 59 5.4 Pre-DST lapped transforms at arbitrary overlaps (L < 2f\.1). 61 5.5 Post-DST lapped transforms at arbitrary overlaps (L < 2M). 62 List of Figures IX 5.6 Partially overlapped Pre-DST example showing pre-echo mitigation for sound files of castanets. 64 x List of Tables 2.1 Critical bands measured by Scharf ............. 9 4.1 MOS is a number mapping to the above subjective quality. 50 4.2 Subjective listening tests: Hybrid filter bank (Hybrid) vs. Pure MDCT filter bank (Pure) . .. 51 4.3 PESQ MOS values: Hybrid filter bank (Hybrid) vs. Pure MDCT filter bank (Pure) . .. 51 4.4 PESQ MOS values: DFT spectrum (DFT) vs. MDCT spectrum (MDCT) 54 5.1 Subjective listening tests of Pre-DST coded test files of castanets. .. 65 xi List of Terms AAC MPEG-2 Advanced Audio Coding ADPCM Adaptive DifferentiaI Pulse Code Modulation CELP Code Excited Linear Prediction DCT Discrete Cosine Transform DFT Discrete Fourier Transform DPCM DifferentiaI Pulse Code Modulation DST Discrete Sine Transform EBU-SQAM European Broadcasting Union - Sound Quality Assessment Material ERB Equivalent Rectangular Bandwidth FIR Finite Impulse Response IMDCT Inverse Modified Discrete Cosine Transform ITU International Telecommunication Union MDCT Modified Discrete Cosine Transform MDST Modified Discrete Sine Transform MLT Modulated Lapped Transform MOS Mean Opinion Score
Recommended publications
  • MIGRATING RADIO CALL-IN TALK SHOWS to WIDEBAND AUDIO Radio Is the Original Social Network
    Established 1961 2013 WBA Broadcasters Clinic Madison, WI MIGRATING RADIO CALL-IN TALK SHOWS TO WIDEBAND AUDIO Radio is the original Social Network • Serves local or national audience • Allows real-time commentary from the masses • The telephone becomes the medium • Telephone technical factors have limited the appeal of the radio “Social Network” Telephones have changed over the years But Telephone Sound has not changed (and has gotten worse) This is very bad for Radio Why do phones sound bad? • System designed for efficiency not comfort • Sampling rate of 8kHz chosen for all calls • 4 kHz max response • Enough for intelligibility • Loses depth, nuance, personality • Listener fatigue Why do phones sound so bad ? (cont) • Low end of telephone calls have intentional high- pass filtering • Meant to avoid AC power hum pickup in phone lines • Lose 2-1/2 Octaves of speech audio on low end • Not relevant for digital Why Phones Sound bad (cont) Los Angeles Times -- January 10, 2009 Verizon Communications Inc., the second-biggest U.S. telephone company, plans to do away with traditional phone lines within seven years as it moves to carry all calls over the Internet. An Internet-based service can be maintained at a fraction of the cost of a phone network and helps Verizon offer a greater range of services, Stratton said. "We've built our business over the years with circuit-switched voice being our bread and butter...but increasingly, we are in the business of selling, basically, data connectivity," Chief Marketing Officer John Stratton said. VoIP
    [Show full text]
  • Audio Coding for Digital Broadcasting
    Recommendation ITU-R BS.1196-7 (01/2019) Audio coding for digital broadcasting BS Series Broadcasting service (sound) ii Rec. ITU-R BS.1196-7 Foreword The role of the Radiocommunication Sector is to ensure the rational, equitable, efficient and economical use of the radio- frequency spectrum by all radiocommunication services, including satellite services, and carry out studies without limit of frequency range on the basis of which Recommendations are adopted. The regulatory and policy functions of the Radiocommunication Sector are performed by World and Regional Radiocommunication Conferences and Radiocommunication Assemblies supported by Study Groups. Policy on Intellectual Property Right (IPR) ITU-R policy on IPR is described in the Common Patent Policy for ITU-T/ITU-R/ISO/IEC referenced in Resolution ITU-R 1. Forms to be used for the submission of patent statements and licensing declarations by patent holders are available from http://www.itu.int/ITU-R/go/patents/en where the Guidelines for Implementation of the Common Patent Policy for ITU-T/ITU-R/ISO/IEC and the ITU-R patent information database can also be found. Series of ITU-R Recommendations (Also available online at http://www.itu.int/publ/R-REC/en) Series Title BO Satellite delivery BR Recording for production, archival and play-out; film for television BS Broadcasting service (sound) BT Broadcasting service (television) F Fixed service M Mobile, radiodetermination, amateur and related satellite services P Radiowave propagation RA Radio astronomy RS Remote sensing systems S Fixed-satellite service SA Space applications and meteorology SF Frequency sharing and coordination between fixed-satellite and fixed service systems SM Spectrum management SNG Satellite news gathering TF Time signals and frequency standards emissions V Vocabulary and related subjects Note: This ITU-R Recommendation was approved in English under the procedure detailed in Resolution ITU-R 1.
    [Show full text]
  • Polycom Voice
    Level 1 Technical – Polycom Voice Level 1 Technical Polycom Voice Contents 1 - Glossary .......................................................................................................................... 2 2 - Polycom Voice Networks ................................................................................................. 3 Polycom UC Software ...................................................................................................... 3 Provisioning ..................................................................................................................... 3 3 - Key Features (Desktop and Conference Phones) ............................................................ 5 OpenSIP Integration ........................................................................................................ 5 Microsoft Integration ........................................................................................................ 5 Lync Qualification ............................................................................................................ 5 Better Together over Ethernet ......................................................................................... 5 BroadSoft UC-One Integration ......................................................................................... 5 Conference Link / Conference Link2 ................................................................................ 6 Polycom Desktop Connector ..........................................................................................
    [Show full text]
  • NTT DOCOMO Technical Journal
    3GPP EVS Codec for Unrivaled Speech Quality and Future Audio Communication over VoLTE EVS Speech Coding Standardization NTT DOCOMO has been engaged in the standardization of Research Laboratories Kimitaka Tsutsumi the 3GPP EVS codec, which is designed specifically for Kei Kikuiri VoLTE to further enhance speech quality, and has contributed to establishing a far-sighted strategy for making the EVS codec cover a variety of future communication services. Journal NTT DOCOMO has also proposed technical solutions that provide speech quality as high as FM radio broadcasts and that achieve both high coding efficiency and high audio quality not possible with any of the state-of-the-art speech codecs. The EVS codec will drive the emergence of a new style of speech communication entertainment that will combine BGM, sound effects, and voice in novel ways for mobile users. 2 Technical Band (AMR-WB)* [2] that is used in also encode music at high levels of quality 1. Introduction NTT DOCOMO’s VoLTE and that sup- and efficiency for non-real-time services, The launch of Voice over LTE (VoL- port wideband speech with a sampling 3GPP experts agreed to adopt high re- TE) services and flat-rate voice service frequency*3 of 16 kHz. In contrast, EVS quirements in the EVS codec for music has demonstrated the importance of high- has been designed to support super-wide- despite its main target of real-time com- quality telephony service to mobile users. band*4 speech with a sampling frequen- munication. Furthermore, considering In line with this trend, the 3rd Genera- cy of 32 kHz thereby achieving speech that telephony services using AMR-WB tion Partnership Project (3GPP) complet- of FM-radio quality*5.
    [Show full text]
  • The Growing Importance of HD Voice in Applications the Growing Importance of HD Voice in Applications White Paper
    White Paper The Growing Importance of HD Voice in Applications The Growing Importance of HD Voice in Applications White Paper Executive Summary A new excitement has entered the voice communications industry with the advent of wideband audio, commonly known as High Definition Voice (HD Voice). Although enterprises have gradually been moving to HD VoIP within their own networks, these networks have traditionally remained “islands” of HD because interoperability with other networks that also supported HD Voice has been difficult. With the introduction of HD Voice on mobile networks, which has been launched on numerous commercial mobile networks and many wireline VoIP networks worldwide, consumers can finally experience this new technology firsthand. Verizon, AT&T, T-Mobile, Deutsche Telekom, Orange and other mobile operators now offer HD Voice as a standard feature. Because mobile users tend to adopt new technology rapidly, replacing their mobile devices seemingly as fast as the newest models are released, and because landline VoIP speech is typically done via a headset, the growth of HD Voice continues to be high and in turn, the need for HD- capable applications will further accelerate. This white paper provides an introduction to HD Voice and discusses its current adoption rate and future potential, including use case examples which paint a picture that HD Voice upgrades to network infrastructure and applications will be seen as important, and perhaps as a necessity to many. 2 The Growing Importance of HD Voice in Applications White Paper Table of Contents What Is HD Voice? . 4 Where is HD Voice Being Deployed? . 4 Use Case Examples .
    [Show full text]
  • Subjective Assessment of Audio Quality – the Means and Methods Within the EBU
    Subjective assessment of audio quality – the means and methods within the EBU W. Hoeg (Deutsch Telekom Berkom) L. Christensen (Danmarks Radio) R. Walker (BBC) This article presents a number of useful means and methods for the subjective quality assessment of audio programme material in radio and television, developed and verified by EBU Project Group, P/LIST. The methods defined in several new 1. Introduction EBU Recommendations and Technical Documents are suitable for The existing EBU Recommendation, R22 [1], both operational and training states that “the amount of sound programme ma- purposes in broadcasting terial which is exchanged between EBU Members, organizations. and between EBU Members and other production organizations, continues to increase” and that “the only sufficient method of assessing the bal- ance of features which contribute to the quality of An essential prerequisite for ensuring a uniform the sound in a programme is by listening to it.” high quality of sound programmes is to standard- ize the means and methods required for their as- Therefore, “listening” is an integral part of all sessment. The subjective assessment of sound sound and television programme-making opera- quality has for a long time been carried out by tions. Despite the very significant advances of international organizations such as the (former) modern sound monitoring and measurement OIRT [2][3][4], the (former) CCIR (now ITU-R) technology, these essentially objective solutions [5] and the Member organizations of the EBU remain unable to tell us what the programme will itself. It became increasingly important that com- really sound like to the listener at home.
    [Show full text]
  • HD Voice – a Revolution in Voice Communication
    HD Voice – a revolution in voice communication Besides data capacity and coverage, which are one of the most important factors related to customers’ satisfaction in mobile telephony nowadays, we must not forget about the intrinsic characteristic of the mobile communication – the Voice. Ever since the nineties and the introduction of GSM there have not been much improvements in the area of voice communication and quality of sound has not seen any major changes. Smart Network going forward! Mobile phones made such a progress in recent years that they have almost replaced PCs, but their basic function, voice calls, is still irreplaceable and vital in mobile communication and it has to be seamless. In order to grow our customer satisfaction and expand our service portfolio, Smart Network engineers of Telenor Serbia have enabled HD Voice by introducing new network features and transitioning voice communication to all IP network. This transition delivers crystal-clear communication between the two parties greatly enhancing customer experience during voice communication over smartphones. Enough with the yelling into smartphones! HD Voice (or High-Definition Voice) represents a significant upgrade to sound quality in mobile communications. Thanks to this feature users experience clarity, smoothly reduced background noise and a feeling that the person they are talking to is standing right next to them or of "being in the same room" with the person on the other end of the phone line. On the more technical side, “HD Voice is essentially wideband audio technology, something that long has been used for conference calling and VoIP apps. Instead of limiting a call frequency to between 300 Hz and 3.4 kHz, a wideband audio call transmits at a range of 50 Hz to 7 kHz, or higher.
    [Show full text]
  • Equalization
    CHAPTER 3 Equalization 35 The most intuitive effect — OK EQ is EZ U need a Hi EQ IQ MIX SMART QUICK START: Equalization GOALS ■ Fix spectral problems such as rumble, hum and buzz, pops and wind, proximity, and hiss. ■ Fit things into the mix by leveraging of any spectral openings available and through complementary cuts and boosts on spectrally competitive tracks. ■ Feature those aspects of each instrument that players and music fans like most. GEAR ■ Master the user controls for parametric, semiparametric, program, graphic, shelving, high-pass and low-pass filters. ■ Choose the equalizer with the capabilities you need, focusing particularly on the parameters, slopes, and number of bands available. Home stereos have tone controls. We in the studio get equalizers. Perhaps bet- ter described as a “spectral modifier” or “frequency-specific amplitude adjuster,” the equalizer allows the mix engineer to increase or decrease the level of specific frequency ranges within a signal. Having an equalizer is like having multiple volume knobs for a single track. Unlike the volume knob that attenuates or boosts an entire signal, the equalizer is the tool used to turn down or turn up specific frequency portions of any audio track . Out of all signal-processing tools that we use while mixing and tracking, EQ is probably the easiest, most intuitive to use—at first. Advanced applications of equalization, however, demand a deep understanding of the effect and all its Mix Smart. © 2011 Elsevier Inc. All rights reserved. 36 Mix Smart possibilities. Don't underestimate the intellectual challenge and creative poten- tial of this essential mix processor.
    [Show full text]
  • Low Bit-Rate Speech Coding with Vq-Vae and a Wavenet Decoder
    ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 735-739. IEEE, 2019. DOI: 10.1109/ICASSP.2019.8683277. c 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. LOW BIT-RATE SPEECH CODING WITH VQ-VAE AND A WAVENET DECODER Cristina Garbaceaˆ 1,Aaron¨ van den Oord2, Yazhe Li2, Felicia S C Lim3, Alejandro Luebs3, Oriol Vinyals2, Thomas C Walters2 1University of Michigan, Ann Arbor, USA 2DeepMind, London, UK 3Google, San Francisco, USA ABSTRACT compute the true information rate of speech to be less than In order to efficiently transmit and store speech signals, 100 bps, yet current systems typically require a rate roughly speech codecs create a minimally redundant representation two orders of magnitude higher than this to produce good of the input signal which is then decoded at the receiver quality speech, suggesting that there is significant room for with the best possible perceptual quality. In this work we improvement in speech coding. demonstrate that a neural network architecture based on VQ- The WaveNet [8] text-to-speech model shows the power VAE with a WaveNet decoder can be used to perform very of learning from raw data to generate speech. Kleijn et al. [9] low bit-rate speech coding with high reconstruction qual- use a learned WaveNet decoder to produce audio comparable ity.
    [Show full text]
  • 4. MPEG Layer-3 Audio Encoding
    MPEG Layer-3 An introduction to MPEG Layer-3 K. Brandenburg and H. Popp Fraunhofer Institut für Integrierte Schaltungen (IIS) MPEG Layer-3, otherwise known as MP3, has generated a phenomenal interest among Internet users, or at least among those who want to download highly-compressed digital audio files at near-CD quality. This article provides an introduction to the work of the MPEG group which was, and still is, responsible for bringing this open (i.e. non-proprietary) compression standard to the forefront of Internet audio downloads. 1. Introduction The audio coding scheme MPEG Layer-3 will soon celebrate its 10th birthday, having been standardized in 1991. In its first years, the scheme was mainly used within DSP- based codecs for studio applications, allowing professionals to use ISDN phone lines as cost-effective music links with high sound quality. In 1995, MPEG Layer-3 was selected as the audio format for the digital satellite broadcasting system developed by World- Space. This was its first step into the mass market. Its second step soon followed, due to the use of the Internet for the electronic distribution of music. Here, the proliferation of audio material – coded with MPEG Layer-3 (aka MP3) – has shown an exponential growth since 1995. By early 1999, “.mp3” had become the most popular search term on the Web (according to http://www.searchterms.com). In 1998, the “MPMAN” (by Saehan Information Systems, South Korea) was the first portable MP3 player, pioneering the road for numerous other manufacturers of consumer electronics. “MP3” has been featured in many articles, mostly on the business pages of newspapers and periodicals, due to its enormous impact on the recording industry.
    [Show full text]
  • EBU Evaluations of Multichannel Audio Codecs
    EBU – TECH 3324 EBU Evaluations of Multichannel Audio Codecs Status: Report Source: D/MAE Geneva September 2007 1 Page intentionally left blank. This document is paginated for recto-verso printing Tech 3324 EBU evaluations of multichannel audio codecs Contents 1. Introduction ................................................................................................... 5 2. Participating Test Sites ..................................................................................... 6 3. Selected Codecs for Testing ............................................................................... 6 3.1 Phase 1 ....................................................................................................... 9 3.2 Phase 2 ...................................................................................................... 10 4. Codec Parameters...........................................................................................10 5. Test Sequences ..............................................................................................10 5.1 Phase 1 ...................................................................................................... 10 5.2 Phase 2 ...................................................................................................... 11 6. Encoding Process ............................................................................................12 6.1 Codecs....................................................................................................... 12 6.2 Verification of bit-rate
    [Show full text]
  • Methods of Sound Data Compression \226 Comparison of Different Standards
    See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/251996736 Methods of sound data compression — Comparison of different standards Article CITATIONS READS 2 151 2 authors, including: Wojciech Zabierowski Lodz University of Technology 123 PUBLICATIONS 96 CITATIONS SEE PROFILE Some of the authors of this publication are also working on these related projects: How to biuld correct web application View project All content following this page was uploaded by Wojciech Zabierowski on 11 June 2014. The user has requested enhancement of the downloaded file. 1 Methods of sound data compression – comparison of different standards Norbert Nowak, Wojciech Zabierowski Abstract - The following article is about the methods of multimedia devices, DVD movies, digital television, data sound data compression. The technological progress has transmission, the Internet, etc. facilitated the process of recording audio on different media such as CD-Audio. The development of audio Modeling and coding data compression has significantly made our lives One's requirements decide what type of compression he easier. In recent years, much has been achieved in the applies. However, the choice between lossy or lossless field of audio and speech compression. Many standards method also depends on other factors. One of the most have been established. They are characterized by more important is the characteristics of data that will be better sound quality at lower bitrate. It allows to record compressed. For instance, the same algorithm, which the same CD-Audio formats using "lossy" or lossless effectively compresses the text may be completely useless compression algorithms in order to reduce the amount in the case of video and sound compression.
    [Show full text]