6.003: Signal Processing

Total Page:16

File Type:pdf, Size:1020Kb

6.003: Signal Processing 6.003: Signal Processing Fourier-Based Audio Compression Review of Lossy Compression, Discrete Cosine Transform (DCT) • Brief Introduction to MDCT • Additional Considerations for Audio Encoding • 2 May 2019 Today: Lossy Compression As opposed to “lossless” compression (LZW, Huffman, zip, gzip, xzip, ...), “lossy” compression achieves a decrease in file size by throwing away information from the original signal. Goal: convey the “important” parts of the signal using as few bits as possible. Lossy Compression Key idea: through away the “unimportant” bits (i.e., bits that won’t be noticed). Doing this involves knowing something about what it means for something to be noticeable. Many aspects of human perception are frequency based many lossy formats use frequency-based methods (along w/ mod- → els of human perception). Lossy Compression: High-level View To Encode: Split signal into “frames” • Transform each frame into Fourier representation • Throw away (or attentuate) some coefficients • Additional lossless compression (LZW, RLE, Huffman, etc.) • To Decode: Undo lossless compression • Transform each frame into time/spatial representation • This is pretty standard! Both JPEG and MP3, for example, work roughly this way. Given this, one goal is to get the “important” information in a signal into relatively few coefficients in FD (“energy compaction”). Energy Compaction One goal is to get the “important” information in a signal into relatively few coefficients in FD (“energy compaction”). It turns out the DFT has some problems in this regard. Consider the following signal, broken into 8-sample-long frames: original signal n 0 8 sample “frame” n 0 Why is the DFT undesireable in this case, given our goal of compression? Discrete Cosine Transform It is much more common to use the DCT (Discrete Cosine Trans- form) in compression applications. The DCT (or variants thereof) are used in JPEG, AAC, Vorbis, WMA, MP3, .... The DCT (more formally, the DCT-II) is defined by: N 1 1 − π 1 X [k] = x[n] cos n + k C N N 2 nX=0 DCT: Relationship to DFT N 1 1 − π 1 X [k] = x[n] cos n + k C N N 2 nX=0 N 1 1 − j π n / k j π n / k = x[n] e N ( +1 2) + e N ( +1 2) 2N − nX=0 N 1 1 j π 1 k − j π n k j π nk = e N 2 x[n] e N ( +1) + e N 2N − − nX=0 N 1 N 1 1 j π 1 k − j 2π ( n 1)k − j 2π nk = e− N 2 x[n]e− 2N − − + x[n]e− 2N 2N ! nX=0 nX=0 N 1 1 j π 1 k − j 2π nk j π 1 k = e− N 2 x˜[n]e− 2N = e− N 2 X˜[k] 2N n= N X− where x˜[ ] is given by the following, and the DFT coefficients X˜[ ] · · are computed with an analysis window of length 2N: x[n] if 0 n < N x˜[n] =x ˜[n + 2N] = ≤ x[ n 1] if N < n < 0 − − − Discrete Cosine Transform The DCT is commonly used in compression applications. We can think about computing the DCT by first putting a mirrored copy of a windowed signal next to itself, and then computing the DFT of that new signal (shifted by 1/2 sample): 16-sample8 sample shifted, “frame” mirrored frame nn 00 Why is the DCT more appropriate, given our goals? How does this approach fix the issue(s) we saw with the DFT? The Discrete Cosine Transform N 1 1 1 − πk(n + 2 ) XC[k] = x[n] cos N N ! nX=0 j2π k n j2π k n k 1 Re e N Im e N cos π N (n + 2 ) k = 0 n n k = 1 n n k = 2 n n k = 3 n n k = 4 n n k = 5 n n k = 6 n n k = 7 n n Energy Compaction Example: Ramp For many authentic signals (photographs, etc), the DCT has good “energy compaction”: most of the energy in the signal is represented by relatively few coefficients. Consider DFT vs DCT of a “ramp:” x[n] 14 12 10 8 6 4 2 0 0 2 4 6 8 10 12 14 n Energy Compaction Example: Ramp For many authentic signals (photographs, etc), the DCT has good “energy compaction”: most of the energy in the signal is represented by relatively few coefficients. Consider DFT vs DCT of a “ramp:” |X[k]| |XC[k]| 7 7 6 6 5 5 4 4 3 3 2 2 1 1 0 0 0 2 4 6 8 10 12 14 0 2 4 6 8 10 12 14 k k Audio Compression Last time, we looked at something akin to JPEG compression for images. High-level, what we did was: 2D-DCT of 8-by-8 blocks of greyscale images • In each block, zero out coefficients that are below some threshold • Let’s try the same approach with audio. Audio Compression That didn’t sound very good, really... :( What were the most noticeable artifacts in the reconstructed version? Where did they come from? How did this compare to what we saw with JPEG? Audio Compression v2 Let’s try a different approach: Rather than zeroing out coefficients below the threshold, let’s quan- tize them differently (for example, use 8 bits for each sample below the threshold and 16 bits for each value above the threshold). How does this compare? What artifacts remain? How can we explain them? MDCT The biggest issue with the last scheme was artifacts at the frame boundaries. Many modern audio compression schemes (MP3, AAC, WMA, Vor- bis, ...) don’t use the DCT directly, but rather a related transform called the MDCT (Modified Discrete Cosine Transform), which mit- igates these issues. This is a lapped transform: 2N time-domain samples turn into N frequency-domain samples. By taking the transforms of overlapping windows and summing, we can reconstruct the original sequence exactly (similar to overlap-add method we saw with DFT). This principle is referred to as time-domain aliasing cancellation. MDCT 2.5 ] n [ x 0.0 2.5 0.0 window 2.5 0.0 MDCT 2.5 0.0 reconstructed 2.5 0.0 window 2.5 0.0 MDCT 2.5 0.0 reconstructed 2.5 sum 0.0 0 100 200 300 400 500 MDCT Formally, the MDCT is defined by: 2N 1 1 − π 1 N 1 X [k] = x[n] cos n + + k + M 2N N 2 2 2 nX=0 N 1 − π 1 N 1 y[n] = X [k] cos n + + k + M N 2 2 2 kX=0 Including a window function on both x[ ] and y[ ] can avoid disconti- · · nuities at the endpoints. Similar to DCT in terms of energy com- paction, but avoids issues with discontinuities on frame boundaries. Audio Compression v3 Let’s look at a compression scheme that uses the MDCT. What Else is There? We have been able to achieve decent compression rates, but nothing close to MP3, for example. MP3 can ahieve around a 6:1 compres- sion ratio before expert listeners are able to distinguish between compressed and original audio. This approach is actually somewhat similar to MP3, but we’re not quite there, so what are we missing? Psychoacoustic Modeling Importantly, our goal is ultimately to throw away information that is perceptually unimportant. To this end, MP3 includes a model of human perception of audio, including: Threshold of hearing: • how loud must a signal be in order to hear it? Frequency masking: • a loud component at a particular frequency “masks” nearby fre- quencies Temporal masking: • when two tones are close together in time, one can mask the other. High-level overview MP3 encoding process broken down into steps: 1. Filter the audio signal into frequency sub-bands 2. Determine the amount of masking for each band caused by nearby bands (in time and in freq) using the psychoacoustic model 3. If the signal is too small (or if it is “masked” by nearby frequen- cies), don’t encode it 4. Otherwise, determine the number of bits needed to represent it such that the noise introduced by quantization is not audible (below the masking effect) 5. Put these bits together into the proper file format Other Concerns Other domain-specific codecs may use other strategies; for example, some audio codecs designed to compress speech (as opposed to music, etc) will use something like LPC (discussed in the lecture on speech). They can then use a small number of bits to represent the parameters of the model, and use some additional bits to represent differences from that prediction..
Recommended publications
  • Fast Computational Structures for an Efficient Implementation of The
    ARTICLE IN PRESS Signal Processing ] (]]]]) ]]]–]]] Contents lists available at ScienceDirect Signal Processing journal homepage: www.elsevier.com/locate/sigpro Fast computational structures for an efficient implementation of the complete TDAC analysis/synthesis MDCT/MDST filter banks Vladimir Britanak a,Ã, Huibert J. Lincklaen Arrie¨ns b a Institute of Informatics, Slovak Academy of Sciences, Dubravska cesta 9, 845 07 Bratislava, Slovak Republic b Delft University of Technology, Department of Electrical Engineering, Mathematics and Computer Science, Mekelweg 4, 2628 CD Delft, The Netherlands article info abstract Article history: A new fast computational structure identical both for the forward and backward Received 27 May 2008 modified discrete cosine/sine transform (MDCT/MDST) computation is described. It is Received in revised form the result of a systematic construction of a fast algorithm for an efficient implementa- 8 January 2009 tion of the complete time domain aliasing cancelation (TDAC) analysis/synthesis MDCT/ Accepted 10 January 2009 MDST filter banks. It is shown that the same computational structure can be used both for the encoder and the decoder, thus significantly reducing design time and resources. Keywords: The corresponding generalized signal flow graph is regular and defines new sparse Modified discrete cosine transform matrix factorizations of the discrete cosine transform of type IV (DCT-IV) and MDCT/ Modified discrete sine transform MDST matrices. The identical fast MDCT computational structure provides an efficient Modulated
    [Show full text]
  • Lossless Compression of Audio Data
    CHAPTER 12 Lossless Compression of Audio Data ROBERT C. MAHER OVERVIEW Lossless data compression of digital audio signals is useful when it is necessary to minimize the storage space or transmission bandwidth of audio data while still maintaining archival quality. Available techniques for lossless audio compression, or lossless audio packing, generally employ an adaptive waveform predictor with a variable-rate entropy coding of the residual, such as Huffman or Golomb-Rice coding. The amount of data compression can vary considerably from one audio waveform to another, but ratios of less than 3 are typical. Several freeware, shareware, and proprietary commercial lossless audio packing programs are available. 12.1 INTRODUCTION The Internet is increasingly being used as a means to deliver audio content to end-users for en­ tertainment, education, and commerce. It is clearly advantageous to minimize the time required to download an audio data file and the storage capacity required to hold it. Moreover, the expec­ tations of end-users with regard to signal quality, number of audio channels, meta-data such as song lyrics, and similar additional features provide incentives to compress the audio data. 12.1.1 Background In the past decade there have been significant breakthroughs in audio data compression using lossy perceptual coding [1]. These techniques lower the bit rate required to represent the signal by establishing perceptual error criteria, meaning that a model of human hearing perception is Copyright 2003. Elsevier Science (USA). 255 AU rights reserved. 256 PART III / APPLICATIONS used to guide the elimination of excess bits that can be either reconstructed (redundancy in the signal) orignored (inaudible components in the signal).
    [Show full text]
  • 4. MPEG Layer-3 Audio Encoding
    MPEG Layer-3 An introduction to MPEG Layer-3 K. Brandenburg and H. Popp Fraunhofer Institut für Integrierte Schaltungen (IIS) MPEG Layer-3, otherwise known as MP3, has generated a phenomenal interest among Internet users, or at least among those who want to download highly-compressed digital audio files at near-CD quality. This article provides an introduction to the work of the MPEG group which was, and still is, responsible for bringing this open (i.e. non-proprietary) compression standard to the forefront of Internet audio downloads. 1. Introduction The audio coding scheme MPEG Layer-3 will soon celebrate its 10th birthday, having been standardized in 1991. In its first years, the scheme was mainly used within DSP- based codecs for studio applications, allowing professionals to use ISDN phone lines as cost-effective music links with high sound quality. In 1995, MPEG Layer-3 was selected as the audio format for the digital satellite broadcasting system developed by World- Space. This was its first step into the mass market. Its second step soon followed, due to the use of the Internet for the electronic distribution of music. Here, the proliferation of audio material – coded with MPEG Layer-3 (aka MP3) – has shown an exponential growth since 1995. By early 1999, “.mp3” had become the most popular search term on the Web (according to http://www.searchterms.com). In 1998, the “MPMAN” (by Saehan Information Systems, South Korea) was the first portable MP3 player, pioneering the road for numerous other manufacturers of consumer electronics. “MP3” has been featured in many articles, mostly on the business pages of newspapers and periodicals, due to its enormous impact on the recording industry.
    [Show full text]
  • The H.264 Advanced Video Coding (AVC) Standard
    Whitepaper: The H.264 Advanced Video Coding (AVC) Standard What It Means to Web Camera Performance Introduction A new generation of webcams is hitting the market that makes video conferencing a more lifelike experience for users, thanks to adoption of the breakthrough H.264 standard. This white paper explains some of the key benefits of H.264 encoding and why cameras with this technology should be on the shopping list of every business. The Need for Compression Today, Internet connection rates average in the range of a few megabits per second. While VGA video requires 147 megabits per second (Mbps) of data, full high definition (HD) 1080p video requires almost one gigabit per second of data, as illustrated in Table 1. Table 1. Display Resolution Format Comparison Format Horizontal Pixels Vertical Lines Pixels Megabits per second (Mbps) QVGA 320 240 76,800 37 VGA 640 480 307,200 147 720p 1280 720 921,600 442 1080p 1920 1080 2,073,600 995 Video Compression Techniques Digital video streams, especially at high definition (HD) resolution, represent huge amounts of data. In order to achieve real-time HD resolution over typical Internet connection bandwidths, video compression is required. The amount of compression required to transmit 1080p video over a three megabits per second link is 332:1! Video compression techniques use mathematical algorithms to reduce the amount of data needed to transmit or store video. Lossless Compression Lossless compression changes how data is stored without resulting in any loss of information. Zip files are losslessly compressed so that when they are unzipped, the original files are recovered.
    [Show full text]
  • LOW COMPLEXITY H.264 to VC-1 TRANSCODER by VIDHYA
    LOW COMPLEXITY H.264 TO VC-1 TRANSCODER by VIDHYA VIJAYAKUMAR Presented to the Faculty of the Graduate School of The University of Texas at Arlington in Partial Fulfillment of the Requirements for the Degree of MASTER OF SCIENCE IN ELECTRICAL ENGINEERING THE UNIVERSITY OF TEXAS AT ARLINGTON AUGUST 2010 Copyright © by Vidhya Vijayakumar 2010 All Rights Reserved ACKNOWLEDGEMENTS As true as it would be with any research effort, this endeavor would not have been possible without the guidance and support of a number of people whom I stand to thank at this juncture. First and foremost, I express my sincere gratitude to my advisor and mentor, Dr. K.R. Rao, who has been the backbone of this whole exercise. I am greatly indebted for all the things that I have learnt from him, academically and otherwise. I thank Dr. Ishfaq Ahmad for being my co-advisor and mentor and for his invaluable guidance and support. I was fortunate to work with Dr. Ahmad as his research assistant on the latest trends in video compression and it has been an invaluable experience. I thank my mentor, Mr. Vishy Swaminathan, and my team members at Adobe Systems for giving me an opportunity to work in the industry and guide me during my internship. I would like to thank the other members of my advisory committee Dr. W. Alan Davis and Dr. William E Dillon for reviewing the thesis document and offering insightful comments. I express my gratitude Dr. Jonathan Bredow and the Electrical Engineering department for purchasing the software required for this thesis and giving me the chance to work on cutting edge technologies.
    [Show full text]
  • Lossy Audio Compression Identification
    2018 26th European Signal Processing Conference (EUSIPCO) Lossy Audio Compression Identification Bongjun Kim Zafar Rafii Northwestern University Gracenote Evanston, USA Emeryville, USA [email protected] zafar.rafi[email protected] Abstract—We propose a system which can estimate from an compression parameters from an audio signal, based on AAC, audio recording that has previously undergone lossy compression was presented in [3]. The first implementation of that work, the parameters used for the encoding, and therefore identify the based on MP3, was then proposed in [4]. The idea was to corresponding lossy coding format. The system analyzes the audio signal and searches for the compression parameters and framing search for the compression parameters and framing conditions conditions which match those used for the encoding. In particular, which match those used for the encoding, by measuring traces we propose a new metric for measuring traces of compression of compression in the audio signal, which typically correspond which is robust to variations in the audio content and a new to time-frequency coefficients quantized to zero. method for combining the estimates from multiple audio blocks The first work to investigate alterations, such as deletion, in- which can refine the results. We evaluated this system with audio excerpts from songs and movies, compressed into various coding sertion, or substitution, in audio signals which have undergone formats, using different bit rates, and captured digitally as well lossy compression, namely MP3, was presented in [5]. The as through analog transfer. Results showed that our system can idea was to measure traces of compression in the signal along identify the correct format in almost all cases, even at high bit time and detect discontinuities in the estimated framing.
    [Show full text]
  • Perceptual Audio Coding Contents
    12/6/2007 Perceptual Audio Coding Henrique Malvar Managing Director, Redmond Lab UW Lecture – December 6, 2007 Contents • Motivation • “Source coding”: good for speech • “Sink coding”: Auditory Masking • Block & Lapped Transforms • Audio compression •Examples 2 1 12/6/2007 Contents • Motivation • “Source coding”: good for speech • “Sink coding”: Auditory Masking • Block & Lapped Transforms • Audio compression •Examples 3 Many applications need digital audio • Communication – Digital TV, Telephony (VoIP) & teleconferencing – Voice mail, voice annotations on e-mail, voice recording •Business – Internet call centers – Multimedia presentations • Entertainment – 150 songs on standard CD – thousands of songs on portable music players – Internet / Satellite radio, HD Radio – Games, DVD Movies 4 2 12/6/2007 Contents • Motivation • “Source coding”: good for speech • “Sink coding”: Auditory Masking • Block & Lapped Transforms • Audio compression •Examples 5 Linear Predictive Coding (LPC) LPC periodic excitation N coefficients x()nen= ()+−∑ axnrr ( ) gains r=1 pitch period e(n) Synthesis x(n) Combine Filter noise excitation synthesized speech 6 3 12/6/2007 LPC basics – analysis/synthesis synthesis parameters Analysis Synthesis algorithm Filter residual waveform N en()= xn ()−−∑ axnr ( r ) r=1 original speech synthesized speech 7 LPC variant - CELP selection Encoder index LPC original gain coefficients speech . Synthesis . Filter Decoder excitation codebook synthesized speech 8 4 12/6/2007 LPC variant - multipulse LPC coefficients excitation Synthesis
    [Show full text]
  • ARM MPEG-2 Audio Layer III Decoder Version 1
    ARM MPEG-2 Audio Layer III Decoder Version 1 Programmer’s Guide Copyright © 1999 ARM Limited. All rights reserved. ARM DUI 0121B Copyright © 1999 ARM Limited. All rights reserved. Release Information The following changes have been made to this document. Change history Date Issue Change May 1999 A First release June 1999 B Second release, minor changes Proprietary Notice ARM, the ARM Powered logo, Thumb, and StrongARM are registered trademarks of ARM Limited. The ARM logo, AMBA, Angel, ARMulator, EmbeddedICE, ModelGen, Multi-ICE, ARM7TDMI, ARM9TDMI, TDMI, and STRONG are trademarks of ARM Limited. All other products or services mentioned herein may be trademarks of their respective owners. Neither the whole nor any part of the information contained in, or the product described in, this document may be adapted or reproduced in any material form except with the prior written permission of the copyright holder. The product described in this document is subject to continuous developments and improvements. All particulars of the product and its use contained in this document are given by ARM in good faith. However, all warranties implied or expressed, including but not limited to implied warranties of merchantability, or fitness for purpose, are excluded. This document is intended only to assist the reader in the use of the product. ARM Limited shall not be liable for any loss or damage arising from the use of any information in this document, or any error or omission in such information, or any incorrect use of the product. ii Copyright © 1999 ARM Limited. All rights reserved. ARM DUI 0121B Preface This preface introduces the ARM Moving Pictures Experts Group (MPEG)-2 Audio Layer III (MP3) Decoder.
    [Show full text]
  • Methods of Sound Data Compression \226 Comparison of Different Standards
    See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/251996736 Methods of sound data compression — Comparison of different standards Article CITATIONS READS 2 151 2 authors, including: Wojciech Zabierowski Lodz University of Technology 123 PUBLICATIONS 96 CITATIONS SEE PROFILE Some of the authors of this publication are also working on these related projects: How to biuld correct web application View project All content following this page was uploaded by Wojciech Zabierowski on 11 June 2014. The user has requested enhancement of the downloaded file. 1 Methods of sound data compression – comparison of different standards Norbert Nowak, Wojciech Zabierowski Abstract - The following article is about the methods of multimedia devices, DVD movies, digital television, data sound data compression. The technological progress has transmission, the Internet, etc. facilitated the process of recording audio on different media such as CD-Audio. The development of audio Modeling and coding data compression has significantly made our lives One's requirements decide what type of compression he easier. In recent years, much has been achieved in the applies. However, the choice between lossy or lossless field of audio and speech compression. Many standards method also depends on other factors. One of the most have been established. They are characterized by more important is the characteristics of data that will be better sound quality at lower bitrate. It allows to record compressed. For instance, the same algorithm, which the same CD-Audio formats using "lossy" or lossless effectively compresses the text may be completely useless compression algorithms in order to reduce the amount in the case of video and sound compression.
    [Show full text]
  • SPIRIT MP3 Encoder
    DATASHEET SPIRIT MP3 Encoder MP3, also known as MPEG-1/MPEG-2 Audio Layer 3, is a popular Benefits digital audio encoding and lossy compression format standardized by Moving Picture Experts Group (MPEG). Three layers (layer 1, 2, 3) • Highly optimized code are supported based on the complexity, compression and quality. • Ideal for resource constrained Layer 3 offers the highest compression for a given audio fidelity. applications MPEG-2 Audio standard extends MPEG-1 standard with: • Easy integration and fast time to market • forward and backwards compatible coding of multichannel signals • Industry-leading encoding quality • support for 16, 22.05 and 24 kHz sampling frequencies. • Low CPU usage for longest battery life SPIRIT MP3 Encoder supports MPEG 1, 2 Audio Layer III standards and their low bit rate extension MPEG 2.5. It can be effectively used in such applications as portable audio systems, wireless Key Features communications, car audio systems, set-top boxes, Internet • Full MPEG compliance appliances and PDAs. • High accuracy • Only 20 MHz* CPU load Perceptual Model Noise Control Loop • Small memory footprint • Simple API Scale Factors Filter Joint Bank Stereo Coding PCM Bitstream Quantizer Loop Applications Control • Audio streaming/Digital radio Huffman Coding • Set-top boxes • Mobile phones Control Bitstream Multiplex Data • Portable media players MP3 Bitstream • Internet appliances • Car electronics Features Availability • Low CPU load (as little as 20 MHz peak load) • ARM9E Now • Low memory requirements • Cortex-M3 Now • Highest objective encoding quality scores • Blackfin Now • MPEG 1 / 2 / 2.5 LAYER III, compliant to ISO MPEG standards • AudioDE Now • Supports bitrates from 8 Kb/s up to 320 Kb/s • Tensilica HiFi2 Now • 1- or 2-channel modes are supported • TI OMAP3 Call • TI C6xx Call Specifications • TI C5xx Call The MP3 (MPEG layer 3) audio encoder supports ISO/IEC 11172-3 • MIPS Call MPEG-1 and ISO/IEC 13818-3 MPEG-2 formats, Layer 3, VBR and Free-Format streams, mono or stereo output streams as well as MPEG-2.5 low bit rate extension.
    [Show full text]
  • Using Daala Intra Frames for Still Picture Coding
    Using Daala Intra Frames for Still Picture Coding Nathan E. Egge, Jean-Marc Valin, Timothy B. Terriberry, Thomas Daede, Christopher Montgomery Mozilla Mountain View, CA, USA Abstract—Recent advances in video codec technology have techniques. Unsignaled horizontal and vertical prediction (see been successfully applied to the field of still picture coding. Daala Section II-D), and low complexity prediction of chroma coef- is a new royalty-free video codec under active development that ficients from their spatially coincident luma coefficients (see contains several novel coding technologies. We consider using Daala intra frames for still picture coding and show that it is Section II-E) are two examples. competitive with other video-derived image formats. A finished version of Daala could be used to create an excellent, royalty-free image format. I. INTRODUCTION The Daala video codec is a joint research effort between Xiph.Org and Mozilla to develop a next-generation video codec that is both (1) competitive performance with the state- of-the-art and (2) distributable on a royalty free basis. In work- ing to produce a royalty free video codec, Daala introduces new coding techniques to replace those used in the traditional block-based video codec pipeline. These techniques, while not Fig. 1. State of blocks within the decode pipeline of a codec using lapped completely finished, already demonstrate that it is possible to transforms. Immediate neighbors of the target block (bold lines) cannot be deliver a video codec that meets these goals. used for spatial prediction as they still require post-filtering (dotted lines). All video codecs have, in some form, the ability to code a still frame without prior information.
    [Show full text]
  • Creating Content for Ipod + Itunes
    Apple Education Creating Content for iPod + iTunes This guide provides information about the file formats you can use when creating content compatible with iTunes and iPod. This guide also covers using and editing metadata. To prepare for creating content, you should know a few basics about file formats and metadata. Knowing about file formats will guide you in choosing the correct format for your material based on your needs and the content. Knowing how to use metadata will help you provide your audience with information about your content. In addition, metadata makes browsing and searching easier. This guide also includes recommended tools for creating content. Understanding File Formats To create and distribute materials for playback on iPod and in iTunes, you need to get the materials (primarily audio or video) into compatible file formats. Understanding file formats and how they compare with each other will help you decide the best way to prepare your materials. Apple recommends using the following file formats for iPod and iTunes content: • AAC (Advanced Audio Coding) for audio content AAC is a state-of-the-art, open (not proprietary) format. It is the audio format of choice for Internet, wireless, and digital broadcast arenas. AAC provides audio encoding that compresses much more efficiently than older formats, yet delivers quality rivaling that of uncompressed CD audio. • H.264 for video content H.264 uses the latest innovations in video compression technology to provide incredible video quality from the smallest amount of video data. This means you see crisp, clear video in much smaller files, saving you bandwidth and storage costs over previous generations of video codecs.
    [Show full text]