Perceptual Audio Coding Contents

Total Page:16

File Type:pdf, Size:1020Kb

Perceptual Audio Coding Contents 12/6/2007 Perceptual Audio Coding Henrique Malvar Managing Director, Redmond Lab UW Lecture – December 6, 2007 Contents • Motivation • “Source coding”: good for speech • “Sink coding”: Auditory Masking • Block & Lapped Transforms • Audio compression •Examples 2 1 12/6/2007 Contents • Motivation • “Source coding”: good for speech • “Sink coding”: Auditory Masking • Block & Lapped Transforms • Audio compression •Examples 3 Many applications need digital audio • Communication – Digital TV, Telephony (VoIP) & teleconferencing – Voice mail, voice annotations on e-mail, voice recording •Business – Internet call centers – Multimedia presentations • Entertainment – 150 songs on standard CD – thousands of songs on portable music players – Internet / Satellite radio, HD Radio – Games, DVD Movies 4 2 12/6/2007 Contents • Motivation • “Source coding”: good for speech • “Sink coding”: Auditory Masking • Block & Lapped Transforms • Audio compression •Examples 5 Linear Predictive Coding (LPC) LPC periodic excitation N coefficients x()nen= ()+−∑ axnrr ( ) gains r=1 pitch period e(n) Synthesis x(n) Combine Filter noise excitation synthesized speech 6 3 12/6/2007 LPC basics – analysis/synthesis synthesis parameters Analysis Synthesis algorithm Filter residual waveform N en()= xn ()−−∑ axnr ( r ) r=1 original speech synthesized speech 7 LPC variant - CELP selection Encoder index LPC original gain coefficients speech . Synthesis . Filter Decoder excitation codebook synthesized speech 8 4 12/6/2007 LPC variant - multipulse LPC coefficients excitation Synthesis Compute pulse Filter positions and amplitudes original speech synthesized speech 9 G.723.1 architecture Audio Input FRAMER Synthesis Filter Coefficients LPC ANALYSIS HIGHPASS FILTER ZERO-INPUT RESPONSE SIMULATED PERCEPTUAL (MEMORY) DECODER NOISE SHAPING Past Decoded Weighted Residual Residual Excitation + – MP-MLQ/ACELP Indices PITCH + EXCITATION COMPUTATION ESTIMATION Encoded Pitch Values 10 5 12/6/2007 Contents • Motivation • “Source coding”: good for speech • “Sink coding”: Auditory Masking • Block & Lapped Transforms • Audio compression •Examples 11 Physiology of the ear • Automatic gain control – muscles around transmission bones • Directivity – pinna TRANSMISSION EARDRUM BONES • Boost of middle frequencies PINNA AUDITORY NERVE – auditory canal AUDITORY CANAL • Nonlinear processing – auditory nerve COCHLEA • Filter bank separation EUSTACHIAN –cochlea TUBE • Thousands of “microphones” – hair cells in cochlea 12 6 12/6/2007 Filter bank model linear linear nonlinear A1(m1) x(n) bandpass group into amplitude amplitude, filter 1 blocks measurement band 1 incoming sound A2(m2) bandpass group into amplitude amplitude, filter 2 blocks measurement band 2 . AN(mN) bandpass group into amplitude amplitude, filter N blocks measurement band N • Explains frequency-domain masking 13 Frequency-domain masking 60 40 20 de, dB de, u 0 tone 1 -20 amplit tone 2 -40 -60 2 3 4 tone 3 10 10 10 frequency, Hz 60 tone 1+3 40 20 tone 1+2+3 e, dB 0 -20 12 3 amplitud -40 -60 2 3 4 10 10 10 frequency, Hz 14 7 12/6/2007 Absolute threshold of hearing • Fletcher-Munson curves 80 70 60 50 40 30 20 10 0 -10 2 3 4 10 10 10 frequency, Hz • Basis for loudness correction in audio amplifiers 15 Example of masking •Typical spectrum & masking threshold • Original sound: amplitude, dB amplitude, • Sound after removing components below the threshold (1/3 to 1/2 of the data): 16 8 12/6/2007 Contents • Motivation • “Source coding”: good for speech • “Sink coding”: Auditory Masking • Block & Lapped Transforms • Audio compression •Examples 17 Block signal processing x Direct T Extract X = P x Orthogonal Bloc k Input Transform Signal ~ X Processing Output Inverse ~ ~ x = PX Append Signal Orthogonal Bloc k Transform Signal is reconstructed as a linear combination of basis functions 18 9 12/6/2007 Block processing: good and bad • Pro: allows adaptability • Con: blocking artifacts 19 Why transforms? • More efficient signal representation –Freqqyuency domain – Basis functions ~ “typical” signal components • Faster processing – Filtering, compression •Orthoggyonality – Energy preservation – Robustness to quantization 20 10 12/6/2007 Compactness of representation • Maximum energy concentration in as few coefficients as possible • For stationary random signals, the optimal basis is the Karhunen-Loève transform: T λiipRp==xxi, PP I • Basis functions are the columns of P • Minimum geometric mean of transform coefficient variances 21 Sub-optimal transforms •KLT problems: – Signal dependency – P not factorable into sparse components • Sinusoidal transforms: – Asymptotically optimal for large blocks –Frequency component iiinterpretation – Sparse factors - e.g. FFT 22 11 12/6/2007 Lapped transforms • Basis functions have tails beyond block boundaries – Linear combinations of overlapping functions such as generate smooth signals, without blocking artifacts 23 Modulated lapped transforms • Basis functions = cosines modulating the same low-pass (window) prototype h(n): 21M + 1 π pna f =+ hna f cosLF n IF k + I O k M NMH 2 KH 2 K M QP • Can be computed from the DCT or FFT T • Projection X = P x can be computed in Oblog2 Mg operations per input point 24 12 12/6/2007 Fast MLT computation M-sample one-block delay block x()n −hna() z−M uM(/2+ n ) . ha()M−1−n Uk() X ()k uM(/21− − n ) MLT coefficients x()M −−1 n hna() input signal DCT-IV windowing transform output signal −hns () yn() w(M /2+n) . Yk() hMs ()−−1 n . processed wM(/21− − n ) z−M MLT y()M −1−n coefficients one-block hns () M-sample DCT-IV delay block transform windowing 25 Basis functions DCT: MLT: 26 13 12/6/2007 Contents • Motivation • “Source coding”: good for speech • “Sink coding”: Auditory Masking • Block & Lapped Transforms • Audio compression •Examples 27 Basic architecture TIME-TO- FREQUENCY SIDE INFO TRANSFORM ORIGINAL (MLT) AUDIO SIGNAL ENCODED BITSTREAM MASKING MUX WEIGHTING THRESHOLD SI SPECTRUM SI UNIFORM ENTROPY QUANTIZER ENCODER 28 14 12/6/2007 Quantization of transform coefficients • Quantization = rounding to nearest integer. • Small range of integer values = fewer bits needed to represent data • Step size T controls range of integer values y are mapped y = T int(x /T ) to this value x all values in this range … 29 Encoding of quantized coefficients • Typical plot of quantized transform coefficients 2 1 0 Quantized value Quantized -1 -2 0 1000 2000 3000 4000 5000 6000 Freq uency, Hz • Run-length + entropy coding 30 15 12/6/2007 Basic entropy coding Value Codeword • Huffman coding: less -7 '1010101010001' -6 '10101010101' frequent values have -5 '101010100' longer codewords -4 '10101011' -3 '101011' -2 '1011' • More efficient if -1 '01' groups of values are 0 '11' assembled in a vector +1 '00' before coding +2 '100' +3 '10100' +4 '1010100' +5 '1010101011' +6 '101010101001' +7 '1010101010000' 31 Side information & more about EC • Side info: model of frequency spectrum – e.g. averages over subbands • Quantized spectral model determines weighting – masking level used to scale coefficients • Backward adaptation reduces need for SI • Run-length + Vector Huffman works – Context-based AC can be better – Room for better context models via machine learning? 32 16 12/6/2007 Improved architecture TIME-TO- FREQUENCY SIDE INFO TRANSFORM ORIGINAL (MLT) AUDIO SIGNAL ENCODED BITSTREAM MASKING MUX WEIGHTING THRESHOLD SI SPECTRUM CONTEXT MODELING SI SI UNIFORM ENTROPY QUANTIZER ENCODER 33 Examples of context modeling • For strongly voiced segments, spectral energies may be well predicted by a “Linear Prediction” model, similar to those used in VoIP coders. • For strongly periodic components, spectral energies may be predicted by a pitch model. • For noisy segments, a noise-only model may allow for very coarse quantization Æ lower data rate. 34 17 12/6/2007 Other aspects & directions •Stereo coding – (L+R)/2 & L-R coding, expandable to multichannel – Intensity + balance coding – Mode switching – extra work for encoder only • Lossless coding – Easily achievable via integer transforms – exactly reversible via integer arithmetic – example: lifting-based MLT (see Refs) • Using complex subband decompositions (MCLT) – Potential for more sophisticated auditory models – Efficient encoding is an open problem 35 Audio coding standards MPEG-1 Layer III (MP3) · MPEG-1 Layer II · MPEG-1 ISO/IEC Layer I · AAC · HE-AAC · HE-AAC v2 G.711 · G.722 · G.722.1 · G.722.2 · G.723 · ITU-T G.723.1 · G.726 · G.728 · G.729 · G.729.1 · G.729ª AC3 · AMR · Apple Lossless · ATRAC · FLAC · iLBC · Monkey's Audio · μ-law · Musepack · Nellymoser · Others OptimFROG · RealAudio · RTAudio · SHN · Speex · Vorbis · WavPack · WMA · TAK From http://en.wikipedia.org/wiki/Advanced_Audio_Coding 36 18 12/6/2007 Contents • Motivation • “Source coding”: good for speech • “Sink coding”: Auditory Masking • Block & Lapped Transforms • Audio compression •Examples 37 WMA examples: •Original clip (~1,400 kbps) 64 kbps (MP3) 64 kbps (WMA) • Original clip WMA @ 32 kbps (Internet radio) • More examples at http://www.microsoft.com/windows/windowsmedia/demos/audio_quality_demos.aspx 38 19 12/6/2007 References • S. Shlien, “The modulated lapped transform, its time-varying forms, and its applications to audio coding standards,” IEEE Trans. Speech and Audio Processing, vol. 5, pp. 359-366, July 1997. • H. S. Malvar, “Fast Algorithms for Orthogonal and Biorthogonal Modulated Lapped Transforms,” IEEE Symposium Advances Digital Filtering and Signal Processing, Victoria, Canada, pp. 159–163, June 1998. • H. S. Malvar, “Enhancing the performance of subband
Recommended publications
  • Proceedings 2005
    LAC2005 Proceedings 3rd International Linux Audio Conference April 21 – 24, 2005 ZKM | Zentrum fur¨ Kunst und Medientechnologie Karlsruhe, Germany Published by ZKM | Zentrum fur¨ Kunst und Medientechnologie Karlsruhe, Germany April, 2005 All copyright remains with the authors www.zkm.de/lac/2005 Content Preface ............................................ ............................5 Staff ............................................... ............................6 Thursday, April 21, 2005 – Lecture Hall 11:45 AM Peter Brinkmann MidiKinesis – MIDI controllers for (almost) any purpose . ....................9 01:30 PM Victor Lazzarini Extensions to the Csound Language: from User-Defined to Plugin Opcodes and Beyond ............................. .....................13 02:15 PM Albert Gr¨af Q: A Functional Programming Language for Multimedia Applications .........21 03:00 PM St´ephane Letz, Dominique Fober and Yann Orlarey jackdmp: Jack server for multi-processor machines . ......................29 03:45 PM John ffitch On The Design of Csound5 ............................... .....................37 04:30 PM Pau Arum´ıand Xavier Amatriain CLAM, an Object Oriented Framework for Audio and Music . .............43 Friday, April 22, 2005 – Lecture Hall 11:00 AM Ivica Ico Bukvic “Made in Linux” – The Next Step .......................... ..................51 11:45 AM Christoph Eckert Linux Audio Usability Issues .......................... ........................57 01:30 PM Marije Baalman Updates of the WONDER software interface for using Wave Field Synthesis . 69 02:15 PM Georg B¨onn Development of a Composer’s Sketchbook ................. ....................73 Saturday, April 23, 2005 – Lecture Hall 11:00 AM J¨urgen Reuter SoundPaint – Painting Music ........................... ......................79 11:45 AM Michael Sch¨uepp, Rene Widtmann, Rolf “Day” Koch and Klaus Buchheim System design for audio record and playback with a computer using FireWire . 87 01:30 PM John ffitch and Tom Natt Recording all Output from a Student Radio Station .
    [Show full text]
  • Fast Computational Structures for an Efficient Implementation of The
    ARTICLE IN PRESS Signal Processing ] (]]]]) ]]]–]]] Contents lists available at ScienceDirect Signal Processing journal homepage: www.elsevier.com/locate/sigpro Fast computational structures for an efficient implementation of the complete TDAC analysis/synthesis MDCT/MDST filter banks Vladimir Britanak a,Ã, Huibert J. Lincklaen Arrie¨ns b a Institute of Informatics, Slovak Academy of Sciences, Dubravska cesta 9, 845 07 Bratislava, Slovak Republic b Delft University of Technology, Department of Electrical Engineering, Mathematics and Computer Science, Mekelweg 4, 2628 CD Delft, The Netherlands article info abstract Article history: A new fast computational structure identical both for the forward and backward Received 27 May 2008 modified discrete cosine/sine transform (MDCT/MDST) computation is described. It is Received in revised form the result of a systematic construction of a fast algorithm for an efficient implementa- 8 January 2009 tion of the complete time domain aliasing cancelation (TDAC) analysis/synthesis MDCT/ Accepted 10 January 2009 MDST filter banks. It is shown that the same computational structure can be used both for the encoder and the decoder, thus significantly reducing design time and resources. Keywords: The corresponding generalized signal flow graph is regular and defines new sparse Modified discrete cosine transform matrix factorizations of the discrete cosine transform of type IV (DCT-IV) and MDCT/ Modified discrete sine transform MDST matrices. The identical fast MDCT computational structure provides an efficient Modulated
    [Show full text]
  • LOW COMPLEXITY H.264 to VC-1 TRANSCODER by VIDHYA
    LOW COMPLEXITY H.264 TO VC-1 TRANSCODER by VIDHYA VIJAYAKUMAR Presented to the Faculty of the Graduate School of The University of Texas at Arlington in Partial Fulfillment of the Requirements for the Degree of MASTER OF SCIENCE IN ELECTRICAL ENGINEERING THE UNIVERSITY OF TEXAS AT ARLINGTON AUGUST 2010 Copyright © by Vidhya Vijayakumar 2010 All Rights Reserved ACKNOWLEDGEMENTS As true as it would be with any research effort, this endeavor would not have been possible without the guidance and support of a number of people whom I stand to thank at this juncture. First and foremost, I express my sincere gratitude to my advisor and mentor, Dr. K.R. Rao, who has been the backbone of this whole exercise. I am greatly indebted for all the things that I have learnt from him, academically and otherwise. I thank Dr. Ishfaq Ahmad for being my co-advisor and mentor and for his invaluable guidance and support. I was fortunate to work with Dr. Ahmad as his research assistant on the latest trends in video compression and it has been an invaluable experience. I thank my mentor, Mr. Vishy Swaminathan, and my team members at Adobe Systems for giving me an opportunity to work in the industry and guide me during my internship. I would like to thank the other members of my advisory committee Dr. W. Alan Davis and Dr. William E Dillon for reviewing the thesis document and offering insightful comments. I express my gratitude Dr. Jonathan Bredow and the Electrical Engineering department for purchasing the software required for this thesis and giving me the chance to work on cutting edge technologies.
    [Show full text]
  • Document Title
    Video-conference system based on open source software Emanuel Frederico Barreiros Castro da Silva Dissertation submitted to obtain the Master Degree in Information Systems and Computer Engineering Jury Chairman: Prof. Doutor Joao Paulo Marques da Silva Supervisor: Prof. Doutor Nuno Filipe Valentim Roma Co-supervisor: Prof. Doutor Pedro Filipe Zeferino Tomas Member: Prof. Doutor Luis Manuel Antunes Veiga June 2012 Acknowledgments I would like to thank Dr. Nuno Roma for the patience in guiding me through all the research, implementation and writing phase of this thesis. I would also like to thank Dr. Pedro Tomas´ for the Latex style sheet for this document and for the final comments and corrections on this thesis. I would also like to thank all my colleagues that helped me and supported me in hard times when i though it was impossible to implement the proposed work and was thinking in giving up. I would like to mention: Joao˜ Lima, Joao˜ Cabral, Pedro Magalhaes˜ and Jose´ Rua for all the support! Joao˜ Lima and Joao˜ Cabral were also fundamental for me, since they discussed different possibilities in the proposed solution with me and gave me fundamental support and feedback on my work! You guys were really the best friends i could had in the time being! I do not have enough thank words for you! Andre´ Mateus should also be mentioned due to all the patience and support for installing, uninstalling, installing, uninstalling and installing different Virtual Machines with Linux some more times. I would like to thank my family, specially my mother, for all the support and care they gave me for almost two years.
    [Show full text]
  • A Game Engine for the Blind
    Bachelor in Computer Science and Engineering 2016/2017 Bachelor Thesis Designing User Experiences: a Game Engine for the Blind Álvaro Cáceres Muñoz Tutor/s: Teresa Onorati Thesis defense date: July 3rd, 2017 This work is subject to the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International Public License. Abstract Video games experience an ever-increasing interest by society since their incep- tion on the 70’s. This form of computer entertainment may let the player have a great time with family and friends, or it may as well provide immersion into a story full of details and emotional content. Prior to the end user playing a video game, a huge effort is performed in lots of disciplines: screenwriting, scenery design, graphical design, programming, opti- mization or marketing are but a few examples. This work is done by game studios, where teams of professionals from different backgrounds join forces in the inception of the video game. From the perspective of Human-Computer Interaction, which studies how people interact with computers to complete tasks [9], a game developer can be regarded as a user whose task is to create the logic of a video game using a computer. One of the main foundations of HCI1. is that an in-depth understanding of the user’s needs and preferences is vital for creating a usable piece of technology. This point is important as a single piece of technology (in this case, the set of tools used by a game developer) may – and should have been designed to – be used on the same team by users with different knowledge, abilities and capabilities.
    [Show full text]
  • DCP-O-Matic Users' Manual
    DCP-o-matic users’ manual Carl Hetherington DCP-o-matic users’ manual ii Contents 1 Introduction 1 1.1 What is DCP-o-matic?.............................................1 1.2 Licence.....................................................1 1.3 Acknowledgements...............................................1 1.4 This manual...................................................1 2 Installation 2 2.1 Windows....................................................2 2.2 Mac OS X....................................................2 2.3 Debian, Ubuntu or Mint Linux........................................2 2.4 Fedora, Centos and Mageia Linux.......................................2 2.5 Arch Linux...................................................3 2.6 Other Linux distributions...........................................3 2.7 ‘Simple’ and ‘Full’ modes...........................................4 3 Creating a DCP from a video 5 3.1 Creating a new film..............................................5 3.2 Adding content.................................................6 3.3 Making the DCP................................................7 4 Creating a DCP from a still image9 5 Manipulating existing DCPs 12 5.1 Importing a DCP into DCP-o-matic..................................... 12 5.2 Decrypting encrypted DCPs.......................................... 12 5.3 Making a DCP from a DCP.......................................... 12 5.3.1 Re-use of existing data......................................... 13 5.3.2 Making overlay files.........................................
    [Show full text]
  • Using Daala Intra Frames for Still Picture Coding
    Using Daala Intra Frames for Still Picture Coding Nathan E. Egge, Jean-Marc Valin, Timothy B. Terriberry, Thomas Daede, Christopher Montgomery Mozilla Mountain View, CA, USA Abstract—Recent advances in video codec technology have techniques. Unsignaled horizontal and vertical prediction (see been successfully applied to the field of still picture coding. Daala Section II-D), and low complexity prediction of chroma coef- is a new royalty-free video codec under active development that ficients from their spatially coincident luma coefficients (see contains several novel coding technologies. We consider using Daala intra frames for still picture coding and show that it is Section II-E) are two examples. competitive with other video-derived image formats. A finished version of Daala could be used to create an excellent, royalty-free image format. I. INTRODUCTION The Daala video codec is a joint research effort between Xiph.Org and Mozilla to develop a next-generation video codec that is both (1) competitive performance with the state- of-the-art and (2) distributable on a royalty free basis. In work- ing to produce a royalty free video codec, Daala introduces new coding techniques to replace those used in the traditional block-based video codec pipeline. These techniques, while not Fig. 1. State of blocks within the decode pipeline of a codec using lapped completely finished, already demonstrate that it is possible to transforms. Immediate neighbors of the target block (bold lines) cannot be deliver a video codec that meets these goals. used for spatial prediction as they still require post-filtering (dotted lines). All video codecs have, in some form, the ability to code a still frame without prior information.
    [Show full text]
  • Format De Données
    Format de Cannées • Hfila Perls 2.95Cd nes 1015/0912.99 PHI Format de données Un article de Wild Paris Descartes. Des clés pour comprendre l'Université numérique Accès per catégories au glossaire : Accès thématique HYPERGLOSSAIFtE :ABCDEF GH IJKLMN OP QRSTUVWXYZ Format de données Sommaire Le format des données est la représentation informatique (sous forme de nombres binaires) des données (texte, image, vidéo, audio, archivage et myptage, exécutable) en 1 Format de données vue de les stocker, de les traiter, de les manipuler. Les données peuvent être ainsi • 1.1 Format ouvert échangées entre divers programmes informatiques ou logiciels (qui les lisent et les • 1.2 Format libre interprètent), soit par une connexion directe soit par l'intermédiaire d'un fichier. • 1.3 Format fermé • 1.4 Format Le format de fichier est le type de codage utilisé pour coder les données contenues dans propriétaire un fichier. Il est identifié par l'extension qui suit le nom du fichier. • 1.5 Format normalisé Les fichiers d'ordinateurs peuvent être de deux types : • 1.6 Format conteneur • 2 Les formas texte • Les fichiers ASCII : ils contiennent des caractères qui respectent les codes • 2.1 Bureautique ASCII. Ils sont donc lisibles (éditables dans n'importe quel éditeur de texte). • 3 Les formats d'image • Les fichiers binaires ils contiennent des informations codées directement en • 4 Les formats audio binaire (des 0 et des l). Ces fichiers sont lisibles dans un éditeur de texte, mais • 5 Les formats vidéo complètement déchiffrables par la spécification publique fournie par le producteur. • 6 DR/yr Pour afficher ou exécuter un fichier binaire, il faut donc utiliser un logiciel • 7 Liens pour approfondir compatible avec le logiciel qui a produit ce fichier.
    [Show full text]
  • Lapped Transforms in Perceptual Coding of Wideband Audio
    Lapped Transforms in Perceptual Coding of Wideband Audio Sien Ruan Department of Electrical & Computer Engineering McGill University Montreal, Canada December 2004 A thesis submitted to McGill University in partial fulfillment of the requirements for the degree of Master of Engineering. c 2004 Sien Ruan ° i To my beloved parents ii Abstract Audio coding paradigms depend on time-frequency transformations to remove statistical redundancy in audio signals and reduce data bit rate, while maintaining high fidelity of the reconstructed signal. Sophisticated perceptual audio coding further exploits perceptual redundancy in audio signals by incorporating perceptual masking phenomena. This thesis focuses on the investigation of different coding transformations that can be used to compute perceptual distortion measures effectively; among them the lapped transform, which is most widely used in nowadays audio coders. Moreover, an innovative lapped transform is developed that can vary overlap percentage at arbitrary degrees. The new lapped transform is applicable on the transient audio by capturing the time-varying characteristics of the signal. iii Sommaire Les paradigmes de codage audio d´ependent des transformations de temps-fr´equence pour enlever la redondance statistique dans les signaux audio et pour r´eduire le taux de trans- mission de donn´ees, tout en maintenant la fid´elit´e´elev´ee du signal reconstruit. Le codage sophistiqu´eperceptuel de l’audio exploite davantage la redondance perceptuelle dans les signaux audio en incorporant des ph´enom`enes de masquage perceptuels. Cette th`ese se concentre sur la recherche sur les diff´erentes transformations de codage qui peuvent ˆetre employ´ees pour calculer des mesures de d´eformation perceptuelles efficacement, parmi elles, la transformation enroul´e, qui est la plus largement r´epandue dans les codeurs audio de nos jours.
    [Show full text]
  • Volume 2 – Vidéo Sous Linux
    Volume 2 – Vidéo sous linux Installation des outils vidéo V6.3 du 20 mars 2020 Par Olivier Hoarau ([email protected]) Vidéo sous linux Volume 1 - Installation des outils vidéo Volume 2 - Tutoriel Kdenlive Volume 3 - Tutoriel cinelerra Volume 4 - Tutoriel OpenShot Video Editor Volume 5 - Tutoriel LiVES Table des matières 1 HISTORIQUE DU DOCUMENT................................................................................................................................4 2 PRÉAMBULE ET LICENCE......................................................................................................................................4 3 PRÉSENTATION ET AVERTISSEMENT................................................................................................................5 4 DÉFINITIONS ET AUTRES NOTIONS VIDÉO......................................................................................................6 4.1 CONTENEUR................................................................................................................................................................6 4.2 CODEC.......................................................................................................................................................................6 5 LES OUTILS DE BASE POUR LA VIDÉO...............................................................................................................7 5.1 PRÉSENTATION.............................................................................................................................................................7
    [Show full text]
  • Daala: a Perceptually-Driven Still Picture Codec
    DAALA: A PERCEPTUALLY-DRIVEN STILL PICTURE CODEC Jean-Marc Valin, Nathan E. Egge, Thomas Daede, Timothy B. Terriberry, Christopher Montgomery Mozilla, Mountain View, CA, USA Xiph.Org Foundation ABSTRACT and vertical directions [1]. Also, DC coefficients are com- bined recursively using a Haar transform, up to the level of Daala is a new royalty-free video codec based on perceptually- 64x64 superblocks. driven coding techniques. We explore using its keyframe format for still picture coding and show how it has improved over the past year. We believe the technology used in Daala Multi-Symbol Entropy Coder could be the basis of an excellent, royalty-free image format. Most recent video codecs encode information using binary arithmetic coding, meaning that each symbol can only take 1. INTRODUCTION two values. The Daala range coder supports up to 16 values per symbol, making it possible to encode fewer symbols [6]. Daala is a royalty-free video codec designed to avoid tra- This is equivalent to coding up to four binary values in parallel ditional patent-encumbered techniques used in most cur- and reduces serial dependencies. rent video codecs. In this paper, we propose to use Daala’s keyframe format for still picture coding. In June 2015, Daala was compared to other still picture codecs at the 2015 Picture Perceptual Vector Quantization Coding Symposium (PCS) [1,2]. Since then many improve- Rather than use scalar quantization like the vast majority of ments were made to the bitstream to improve its quality. picture and video codecs, Daala is based on perceptual vector These include reduced overlap in the lapped transform, finer quantization (PVQ) [7].
    [Show full text]
  • Dialogic Diva System Release 8.5WIN Reference Guide
    Dialogic® Diva® System Release 8.5WIN Service Update Reference Guide February 2012 206-339-34 www.dialogic.com Dialogic® Diva® System Release 8.5 WIN Reference Guide Copyright and Legal Notice Copyright © 2000-2012 Dialogic Inc. All Rights Reserved. You may not reproduce this document in whole or in part without permission in writing from Dialogic Inc. at the address provided below. All contents of this document are furnished for informational use only and are subject to change without notice and do not represent a commitment on the part of Dialogic Inc. and its affiliates or subsidiaries (“Dialogic”). Reasonable effort is made to ensure the accuracy of the information contained in the document. However, Dialogic does not warrant the accuracy of this information and cannot accept responsibility for errors, inaccuracies or omissions that may be contained in this document. INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH DIALOGIC® PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN A SIGNED AGREEMENT BETWEEN YOU AND DIALOGIC, DIALOGIC ASSUMES NO LIABILITY WHATSOEVER, AND DIALOGIC DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF DIALOGIC PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY INTELLECTUAL PROPERTY RIGHT OF A THIRD PARTY. Dialogic products are not intended for use in certain safety-affecting situations. Please see http://www.dialogic.com/company/terms-of-use.aspx for more details. Due to differing national regulations and approval requirements, certain Dialogic products may be suitable for use only in specific countries, and thus may not function properly in other countries.
    [Show full text]