Lecture 7: Audio Compression and Coding
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Lossless Audio Codec Comparison
Contents Introduction 3 1 CD-audio test 4 1.1 CD's used . .4 1.2 Results all CD's together . .4 1.3 Interesting quirks . .7 1.3.1 Mono encoded as stereo (Dan Browns Angels and Demons) . .7 1.3.2 Compressibility . .9 1.4 Convergence of the results . 10 2 High-resolution audio 13 2.1 Nine Inch Nails' The Slip . 13 2.2 Howard Shore's soundtrack for The Lord of the Rings: The Return of the King . 16 2.3 Wasted bits . 18 3 Multichannel audio 20 3.1 Howard Shore's soundtrack for The Lord of the Rings: The Return of the King . 20 A Motivation for choosing these CDs 23 B Test setup 27 B.1 Scripting and graphing . 27 B.2 Codecs and parameters used . 27 B.3 MD5 checksumming . 28 C Revision history 30 Bibliography 31 2 Introduction While testing the efficiency of lossy codecs can be quite cumbersome (as results differ for each person), comparing lossless codecs is much easier. As the last well documented and comprehensive test available on the internet has been a few years ago, I thought it would be a good idea to update. Beside comparing with CD-audio (which is often done to assess codec performance) and spitting out a grand total, this comparison also looks at extremes that occurred during the test and takes a look at 'high-resolution audio' and multichannel/surround audio. While the comparison was made to update the comparison-page on the FLAC website, it aims to be fair and unbiased. -
Speech Coding and Compression
Introduction to voice coding/compression PCM coding Speech vocoders based on speech production LPC based vocoders Speech over packet networks Speech coding and compression Corso di Networked Multimedia Systems Master Universitario di Primo Livello in Progettazione e Gestione di Sistemi di Rete Carlo Drioli Università degli Studi di Verona Facoltà di Scienze Matematiche, Dipartimento di Informatica Fisiche e Naturali Speech coding and compression Introduction to voice coding/compression PCM coding Speech vocoders based on speech production LPC based vocoders Speech over packet networks Speech coding and compression: OUTLINE Introduction to voice coding/compression PCM coding Speech vocoders based on speech production LPC based vocoders Speech over packet networks Speech coding and compression Introduction to voice coding/compression PCM coding Speech vocoders based on speech production LPC based vocoders Speech over packet networks Introduction Approaches to voice coding/compression I Waveform coders (PCM) I Voice coders (vocoders) Quality assessment I Intelligibility I Naturalness (involves speaker identity preservation, emotion) I Subjective assessment: Listening test, Mean Opinion Score (MOS), Diagnostic acceptability measure (DAM), Diagnostic Rhyme Test (DRT) I Objective assessment: Signal to Noise Ratio (SNR), spectral distance measures, acoustic cues comparison Speech coding and compression Introduction to voice coding/compression PCM coding Speech vocoders based on speech production LPC based vocoders Speech over packet networks -
Digital Speech Processing— Lecture 17
Digital Speech Processing— Lecture 17 Speech Coding Methods Based on Speech Models 1 Waveform Coding versus Block Processing • Waveform coding – sample-by-sample matching of waveforms – coding quality measured using SNR • Source modeling (block processing) – block processing of signal => vector of outputs every block – overlapped blocks Block 1 Block 2 Block 3 2 Model-Based Speech Coding • we’ve carried waveform coding based on optimizing and maximizing SNR about as far as possible – achieved bit rate reductions on the order of 4:1 (i.e., from 128 Kbps PCM to 32 Kbps ADPCM) at the same time achieving toll quality SNR for telephone-bandwidth speech • to lower bit rate further without reducing speech quality, we need to exploit features of the speech production model, including: – source modeling – spectrum modeling – use of codebook methods for coding efficiency • we also need a new way of comparing performance of different waveform and model-based coding methods – an objective measure, like SNR, isn’t an appropriate measure for model- based coders since they operate on blocks of speech and don’t follow the waveform on a sample-by-sample basis – new subjective measures need to be used that measure user-perceived quality, intelligibility, and robustness to multiple factors 3 Topics Covered in this Lecture • Enhancements for ADPCM Coders – pitch prediction – noise shaping • Analysis-by-Synthesis Speech Coders – multipulse linear prediction coder (MPLPC) – code-excited linear prediction (CELP) • Open-Loop Speech Coders – two-state excitation -
Ardour Export Redesign
Ardour Export Redesign Thorsten Wilms [email protected] Revision 2 2007-07-17 Table of Contents 1 Introduction 4 4.5 Endianness 8 2 Insights From a Survey 4 4.6 Channel Count 8 2.1 Export When? 4 4.7 Mapping Channels 8 2.2 Channel Count 4 4.8 CD Marker Files 9 2.3 Requested File Types 5 4.9 Trimming 9 2.4 Sample Formats and Rates in Use 5 4.10 Filename Conflicts 9 2.5 Wish List 5 4.11 Peaks 10 2.5.1 More than one format at once 5 4.12 Blocking JACK 10 2.5.2 Files per Track / Bus 5 4.13 Does it have to be a dialog? 10 2.5.3 Optionally store timestamps 5 5 Track Export 11 2.6 General Problems 6 6 MIDI 12 3 Feature Requests 6 7 Steps After Exporting 12 3.1 Multichannel 6 7.1 Normalize 12 3.2 Individual Files 6 7.2 Trim silence 13 3.3 Realtime Export 6 7.3 Encode 13 3.4 Range ad File Export History 7 7.4 Tag 13 3.5 Running a Script 7 7.5 Upload 13 3.6 Export Markers as Text 7 7.6 Burn CD / DVD 13 4 The Current Dialog 7 7.7 Backup / Archiving 14 4.1 Time Span Selection 7 7.8 Authoring 14 4.2 Ranges 7 8 Container Formats 14 4.3 File vs Directory Selection 8 8.1 libsndfile, currently offered for Export 14 4.4 Container Types 8 8.2 libsndfile, also interesting 14 8.3 libsndfile, rather exotic 15 12 Specification 18 8.4 Interesting 15 12.1 Core 18 8.4.1 BWF – Broadcast Wave Format 15 12.2 Layout 18 8.4.2 Matroska 15 12.3 Presets 18 8.5 Problematic 15 12.4 Speed 18 8.6 Not of further interest 15 12.5 Time span 19 8.7 Check (Todo) 15 12.6 CD Marker Files 19 9 Encodings 16 12.7 Mapping 19 9.1 Libsndfile supported 16 12.8 Processing 19 9.2 Interesting 16 12.9 Container and Encodings 19 9.3 Problematic 16 12.10 Target Folder 20 9.4 Not of further interest 16 12.11 Filenames 20 10 Container / Encoding Combinations 17 12.12 Multiplication 20 11 Elements 17 12.13 Left out 21 11.1 Input 17 13 Credits 21 11.2 Output 17 14 Todo 22 1 Introduction 4 1 Introduction 2 Insights From a Survey The basic purpose of Ardour's export functionality is I conducted a quick survey on the Linux Audio Users to create mixdowns of multitrack arrangements. -
PXC 550 Wireless Headphones
PXC 550 Wireless headphones Instruction Manual 2 | PXC 550 Contents Contents Important safety instructions ...................................................................................2 The PXC 550 Wireless headphones ...........................................................................4 Package includes ..........................................................................................................6 Product overview .........................................................................................................7 Overview of the headphones .................................................................................... 7 Overview of LED indicators ........................................................................................ 9 Overview of buttons and switches ........................................................................10 Overview of gesture controls ..................................................................................11 Overview of CapTune ................................................................................................12 Getting started ......................................................................................................... 14 Charging basics ..........................................................................................................14 Installing CapTune .....................................................................................................16 Pairing the headphones ...........................................................................................17 -
Audio Coding for Digital Broadcasting
Recommendation ITU-R BS.1196-7 (01/2019) Audio coding for digital broadcasting BS Series Broadcasting service (sound) ii Rec. ITU-R BS.1196-7 Foreword The role of the Radiocommunication Sector is to ensure the rational, equitable, efficient and economical use of the radio- frequency spectrum by all radiocommunication services, including satellite services, and carry out studies without limit of frequency range on the basis of which Recommendations are adopted. The regulatory and policy functions of the Radiocommunication Sector are performed by World and Regional Radiocommunication Conferences and Radiocommunication Assemblies supported by Study Groups. Policy on Intellectual Property Right (IPR) ITU-R policy on IPR is described in the Common Patent Policy for ITU-T/ITU-R/ISO/IEC referenced in Resolution ITU-R 1. Forms to be used for the submission of patent statements and licensing declarations by patent holders are available from http://www.itu.int/ITU-R/go/patents/en where the Guidelines for Implementation of the Common Patent Policy for ITU-T/ITU-R/ISO/IEC and the ITU-R patent information database can also be found. Series of ITU-R Recommendations (Also available online at http://www.itu.int/publ/R-REC/en) Series Title BO Satellite delivery BR Recording for production, archival and play-out; film for television BS Broadcasting service (sound) BT Broadcasting service (television) F Fixed service M Mobile, radiodetermination, amateur and related satellite services P Radiowave propagation RA Radio astronomy RS Remote sensing systems S Fixed-satellite service SA Space applications and meteorology SF Frequency sharing and coordination between fixed-satellite and fixed service systems SM Spectrum management SNG Satellite news gathering TF Time signals and frequency standards emissions V Vocabulary and related subjects Note: This ITU-R Recommendation was approved in English under the procedure detailed in Resolution ITU-R 1. -
Real-Time Programming and Processing of Music Signals Arshia Cont
Real-time Programming and Processing of Music Signals Arshia Cont To cite this version: Arshia Cont. Real-time Programming and Processing of Music Signals. Sound [cs.SD]. Université Pierre et Marie Curie - Paris VI, 2013. tel-00829771 HAL Id: tel-00829771 https://tel.archives-ouvertes.fr/tel-00829771 Submitted on 3 Jun 2013 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Realtime Programming & Processing of Music Signals by ARSHIA CONT Ircam-CNRS-UPMC Mixed Research Unit MuTant Team-Project (INRIA) Musical Representations Team, Ircam-Centre Pompidou 1 Place Igor Stravinsky, 75004 Paris, France. Habilitation à diriger la recherche Defended on May 30th in front of the jury composed of: Gérard Berry Collège de France Professor Roger Dannanberg Carnegie Mellon University Professor Carlos Agon UPMC - Ircam Professor François Pachet Sony CSL Senior Researcher Miller Puckette UCSD Professor Marco Stroppa Composer ii à Marie le sel de ma vie iv CONTENTS 1. Introduction1 1.1. Synthetic Summary .................. 1 1.2. Publication List 2007-2012 ................ 3 1.3. Research Advising Summary ............... 5 2. Realtime Machine Listening7 2.1. Automatic Transcription................. 7 2.2. Automatic Alignment .................. 10 2.2.1. -
Lossless Audio Codec Comparison
Contents Introduction 3 1 Test setup 4 1.1 Scripting and graphing . .4 1.2 Codecs and parameters used . .5 1.3 WMA, RealAudio and ALAC . .6 2 CD-audio test 8 2.1 CD's used . .8 2.2 Results all CD's together . .9 2.3 Interesting quirks . 12 2.3.1 Mono encoded as stereo (Dan Browns Angels and Demons) 12 2.4 Convergence of the results . 15 3 High-resolution audio 17 3.1 Nine Inch Nails' The Slip . 17 3.2 Howard Shore's soundtrack for The Lord of the Rings: The Re- turn of the King . 20 3.3 Wasted bits . 22 4 Multichannel audio 24 4.1 Howard Shore's soundtrack for The Lord of the Rings: The Re- turn of the King . 24 A Motivation for choosing these CDs 27 Bibliography 31 2 Introduction While testing the efficiency of lossy codecs can be quite cumbersome (as results differ for each person), comparing lossless codecs is much easier. As the last well documented and comprehensive test available on the internet has been a few years ago, I thought it would be a good idea to update. Beside comparing with CD-audio (which is often done to assess codec perfor- mance) and spitting out a grand total, this comparison also looks at extremes that occurred during the test and takes a look at 'high-resolution audio' and multichannel/surround audio. While the comparison was made to update the comparison-page on the FLAC website, it aims to be fair and unbiased. Because of this, you'll probably won't find anything that looks like conclusions: test results are displayed and analysed, but there is no judgement or choice made. -
Relating Optical Speech to Speech Acoustics and Visual Speech Perception
UNIVERSITY OF CALIFORNIA Los Angeles Relating Optical Speech to Speech Acoustics and Visual Speech Perception A dissertation submitted in partial satisfaction of the requirements for the degree Doctor of Philosophy in Electrical Engineering by Jintao Jiang 2003 i © Copyright by Jintao Jiang 2003 ii The dissertation of Jintao Jiang is approved. Kung Yao Lieven Vandenberghe Patricia A. Keating Lynne E. Bernstein Abeer Alwan, Committee Chair University of California, Los Angeles 2003 ii Table of Contents Chapter 1. Introduction ............................................................................................... 1 1.1. Audio-Visual Speech Processing ............................................................................ 1 1.2. How to Examine the Relationship between Data Sets ............................................ 4 1.3. The Relationship between Articulatory Movements and Speech Acoustics........... 5 1.4. The Relationship between Visual Speech Perception and Physical Measures ....... 9 1.5. Outline of This Dissertation .................................................................................. 15 Chapter 2. Data Collection and Pre-Processing ...................................................... 17 2.1. Introduction ........................................................................................................... 17 2.2. Background ........................................................................................................... 18 2.3. Recording a Database for the Correlation Analysis............................................. -
(A/V Codecs) REDCODE RAW (.R3D) ARRIRAW
What is a Codec? Codec is a portmanteau of either "Compressor-Decompressor" or "Coder-Decoder," which describes a device or program capable of performing transformations on a data stream or signal. Codecs encode a stream or signal for transmission, storage or encryption and decode it for viewing or editing. Codecs are often used in videoconferencing and streaming media solutions. A video codec converts analog video signals from a video camera into digital signals for transmission. It then converts the digital signals back to analog for display. An audio codec converts analog audio signals from a microphone into digital signals for transmission. It then converts the digital signals back to analog for playing. The raw encoded form of audio and video data is often called essence, to distinguish it from the metadata information that together make up the information content of the stream and any "wrapper" data that is then added to aid access to or improve the robustness of the stream. Most codecs are lossy, in order to get a reasonably small file size. There are lossless codecs as well, but for most purposes the almost imperceptible increase in quality is not worth the considerable increase in data size. The main exception is if the data will undergo more processing in the future, in which case the repeated lossy encoding would damage the eventual quality too much. Many multimedia data streams need to contain both audio and video data, and often some form of metadata that permits synchronization of the audio and video. Each of these three streams may be handled by different programs, processes, or hardware; but for the multimedia data stream to be useful in stored or transmitted form, they must be encapsulated together in a container format. -
Speech Compression Using Discrete Wavelet Transform and Discrete Cosine Transform
International Journal of Engineering Research & Technology (IJERT) ISSN: 2278-0181 Vol. 1 Issue 5, July - 2012 Speech Compression Using Discrete Wavelet Transform and Discrete Cosine Transform Smita Vatsa, Dr. O. P. Sahu M. Tech (ECE Student ) Professor Department of Electronicsand Department of Electronics and Communication Engineering Communication Engineering NIT Kurukshetra, India NIT Kurukshetra, India Abstract reproduced with desired level of quality. Main approaches of speech compression used today are Aim of this paper is to explain and implement waveform coding, transform coding and parametric transform based speech compression techniques. coding. Waveform coding attempts to reproduce Transform coding is based on compressing signal input signal waveform at the output. In transform by removing redundancies present in it. Speech coding at the beginning of procedure signal is compression (coding) is a technique to transform transformed into frequency domain, afterwards speech signal into compact format such that speech only dominant spectral features of signal are signal can be transmitted and stored with reduced maintained. In parametric coding signals are bandwidth and storage space respectively represented through a small set of parameters that .Objective of speech compression is to enhance can describe it accurately. Parametric coders transmission and storage capacity. In this paper attempts to produce a signal that sounds like Discrete wavelet transform and Discrete cosine original speechwhether or not time waveform transform -
21065L Audio Tutorial
a Using The Low-Cost, High Performance ADSP-21065L Digital Signal Processor For Digital Audio Applications Revision 1.0 - 12/4/98 dB +12 0 -12 Left Right Left EQ Right EQ Pan L R L R L R L R L R L R L R L R 1 2 3 4 5 6 7 8 Mic High Line L R Mid Play Back Bass CNTR 0 0 3 4 Input Gain P F R Master Vol. 1 2 3 4 5 6 7 8 Authors: John Tomarakos Dan Ledger Analog Devices DSP Applications 1 Using The Low Cost, High Performance ADSP-21065L Digital Signal Processor For Digital Audio Applications Dan Ledger and John Tomarakos DSP Applications Group, Analog Devices, Norwood, MA 02062, USA This document examines desirable DSP features to consider for implementation of real time audio applications, and also offers programming techniques to create DSP algorithms found in today's professional and consumer audio equipment. Part One will begin with a discussion of important audio processor-specific characteristics such as speed, cost, data word length, floating-point vs. fixed-point arithmetic, double-precision vs. single-precision data, I/O capabilities, and dynamic range/SNR capabilities. Comparisions between DSP's and audio decoders that are targeted for consumer/professional audio applications will be shown. Part Two will cover example algorithmic building blocks that can be used to implement many DSP audio algorithms using the ADSP-21065L including: Basic audio signal manipulation, filtering/digital parametric equalization, digital audio effects and sound synthesis techniques. TABLE OF CONTENTS 0. INTRODUCTION ................................................................................................................................................................4 1.