ECEN 5623 RT Embedded Systems

Total Page:16

File Type:pdf, Size:1020Kb

ECEN 5623 RT Embedded Systems Computer and Machine Vision Deeper Dive into MPEG Digital Video Encoding January 22, 2014 Sam Siewert Reminders CV and MV Use UNCOMPRESSED FRAMES Remote Cameras (E.g. Security) May Need to Transport Frames Capture Over Network to CV/MV Processor We NEED to Understand Both! BEWARE of LOSSY COMPRESSION I-Frame ONLY or MJPEG Decent Compromise of Both Sam Siewert 2 MPEG: Order Of Operators #1 #2B #2C #2A #3 #1: POINT (Pixel) Encoding #2 A-C: Macro-Block Lossy Intra-Frame Compression #3: Motion-Based Compression in Group of Pictures Sam Siewert 3 Step #1 – RGB to YCrCb 4:4:4 24-bit (Lossless) For every Y sample in a scan-line, there is also one CrCb sample – Each Y (Y7:Y0), Cr (Cr7:Cr0), & Cb (Cb7:Cb0) Sample is 8 bits – No compression between RGB and YCrCb 4:4:4 (both 24 bits/pixel) Typically a Post Production, CEDIA or DCI format 0 319 … … 76,480 76,799 … = Y, Cr, and Cb sample = Y sample only Sam Siewert 4 Step #1 – RGB to YCrCb 4:2:2 (Lossy) For every 2 Y samples in a scan-line, one CrCb sample – Each Y (Y7:Y0), Cr (Cr7:Cr0), & Cb (Cb7:Cb0) Sample is 8 bits – Two RGB Pixels = 48 bits, Whereas Two YCrCb is 32 bits, or 16 bits per pixel vs. 24 bits per pixel (33% smaller frame size) 0 319 … 48 bit to 32 bit … 76,480 76,799 … = Y, Cr, and Cb sample = Y sample only Sam Siewert 5 Step #1 – RGB to YCrCb 4:2:0 (Lossy) For every 4 Y samples in a scan-line, one CrCb sample – Each Y (Y7:Y0), Cr (Cr7:Cr0), & Cb (Cb7:Cb0) Sample is 8 bits – Two RGB Pixes = 48 bits, Whereas Four YCrCb is 48 bits, or 12 bits per pixel on average vs. 24 bits per pixel (50% smaller) 0 319 … … … 76,480 76,799 = Cr, Cb sample = Y sample only Sam Siewert 6 Step #2 – Convert to 8x8 Macroblocks and Transform Aspect Ratios Designed to Fit 8x8 Macroblock E.g. 640 x 480 => 80 x 60 Macroblocks Discrete Cosine Transform Applied to Each 8x8 – Spatial Intensity to Frequency Transform – Applied on X Axis (Row) – Applied on Y Axis (Column) Set up for Intra-frame (I-frame) Compression Sam Siewert 7 Convolution Concepts Math operation on 2 functions, that produces a 3rd Point Spread Function “Sharpen” meets this Definition So do Many Mask Operations applied to Pixel Neighborhoods 2 impulses, f(t), g(X – t) Area inside intersection f convolved with g over t Sam Siewert 8 DCT – Discrete Cosine Transform Convolution of Image with Discrete Cosine See http://www.cse.uaa.alaska.edu/~ssiewert/a490dmis_code/example-dct1/ De-convolved to restore image from Convolved Image DCT Inverse DCT Sam Siewert 9 DCT Concepts F(x) is a sum of sinusoids (with frequency, amplitude) DCT operates of a discrete number of samples Can derive DC sum at any x, even where F(x) not known N x N Macro-block has Zero Frequency DC at 0,0 Increasing Horizontal Frequency Increasing Vertical Frequency Can De-convolve (inverse DCT, or iDCT) Can Eliminate High Frequency Horizontal and Vertical Terms – Minimal Losses from Truncation (otherwise lossless) – Loss of High Frequency Image Features (What are These?) Sam Siewert 10 Basic Concept of Waveforms Complex Waveform is Sum of Simple Fundamentals Simple Fundamentals Can Be Derived from Complex Sam Siewert 11 Scanline DCT Example Small Losses Due to DCT, iDCT Numerical Truncation Larger Losses Due to H.O.T. Quantization and Truncation http://www.cse.uaa.alaska.edu/~ssiewert/a490dmis_doc/1D-DCT-N- Fundamentals.xlsx Sam Siewert 12 What Is Lost with DCT Quantization? Noise More Than Anything Else Complex XY Variable Patterns (Real Science Data?) Complex Tiling Higher Frequency X Higher Frequency Y Terms Can Still be Ignored Complex Wood Texture Most Detail in X Far Less in Y Randomized Texture Image High X Detail High Y Detail Most Loss of Detail, But Noisy Sam Siewert 13 Step #2A: Macro-block Discrete Cosine Transform 8x8 Pixel Block – Macro-block – SD NTSC 720x480 (90x60 Macro-blocks), 3:2 Aspect Ratio – HD 720 1280x720 (160x90 Macro-blocks), 16:9 AR – HD 1080 1920x1080 (240x135 Macro-blocks), 16:9 AR Sam Siewert 14 Step #2B: Macro-block Quantization (Lossy) Apply Weighting and Scaling 8x8 to DCT Produces Lots of Repeated Values (and Zeros) Compared to Original Sam Siewert 15 Decode Process for #2A-B Sam Siewert 16 How Lossy is the Decode Macro- Block? Sam Siewert 17 OpenCV Macroblock DCT Example Same Cactus 320x240 with 80x80 DCT Macroblocks DCT iDCT Same Cactus 320x240 Again with 8x8 DCT Macroblocks DCT iDCT Sam Siewert 18 Mathematics for 2D DCT Frequency Variation on X and Y axes from top left to bottom right Straight-forward Algorithm Based on 2D Equation is O(n2) per dimension Like Cooley-Tukey for DFT, a DCT Algorithm that is O(n*log2(n)) has been formulated (Arai, Y.; Agui, T.; Nakajima, M. - Numerical Recipes: http://en.wikipedia.org/wiki/File:Dctjpeg.png The Art of Scientific Computing (3rd ed.)) http://www.cse.uaa.alaska.edu/~ssiewert/a490d mis_code/dct2/dct2.c Sam Siewert 19 Step #2C: Macro-block Run-Length and Huffman Encoding Zig-Zag Run-Length Encoding to Exploit Repeated Data and Zeros found in H.O.T. of Quantized DCT – 86, 1, 7, -5, -1, 0, 1, 0, 0, 2, -1, 1, 0, -1, 0 , 0, 0, 0, -1, 0, 0, … Becomes: Sam Siewert 20 Huffman Applied to RLE Data Huffman Tables for MPEG-2 Macro-Blocks Defined in 13818-2 (Lossless) Compression Based on Probability of Occurance Shannon’s Source Coding Theory: log2(P), P=probability of occurrence, Binary encoding of Symbols Sam Siewert 21 Step #3: Group of Pictures Concept – Transmit Change-Only Data I-Frame Compressed Only Intra-Frame By Methods #2A-2C to Macro-Blocks I-Frame Can Be Decoded Alone P-Frame is Differences Only Over the GoP B-Frame is Differences Only Between Both I-Frame and Closest P-Frame Difference Data Can be Further Encoded with Lossless Methods Without Steps 2A-C, Specifically Quantization, and With High Motion Video, Could Blow-Up Sam Siewert 22 Group of Pictures: High Level View Sam Siewert 23 Overall MPEG YCrCb Compression Performance Standard Definition 720x480x2 (675KB/frame) @ 30fps – Requires 20MB/sec (200 Mbps) Uncompressed – Typical MPEG-2 @ 3.75 Mbps, > 50x Compression – Typical MPEG-4 @ 1.5 Mbps, > 100x Compression – 10 to 20 Programs on QAM 256 (48Mbps, 6MhZ/Ch) – ≈10 MPEG-4 Programs on ATSC 8VSB (19.39 Mbps, 6MhZ/Ch) HD 720p (1280x720x2,1800KB/frame) @ 30fps – Requires 53MB/sec (530Mbps) Uncompressed – Typical MPEG-2 @ 20 Mbps, > 25x Compression – Typical MPEG-4 @ 10 Mbps, > 50x Compression HD 1080p (1920x1080x2, 4050KB/frame) @ 30fps – Requires 120MB/sec (1200Mbps) Uncompressed – Typical MPEG-2, VC-1 @ 45 Mbps, > 30x Compression – Typical MPEG-4 @ 20 Mbps, > 60x Compression Sam Siewert 24 Parsing an Elementary Video Stream Many 188-Byte Packet Types and Header Allows for Multi-plexing of many Video and Audio Streams on a Carrier Sam Siewert 25 MPEG-4 vs. MPEG-2 MPEG-2 – Defined by ISO 13818-1, 13818-2 – Leverages MPEG-1 (Motion Picture Experts Group – 1988) – Widely Used for Digital Video – Digital Cable TV, DVD – Transport Stream designed for Broadcast (Lossy, No Beginning or End of Stream) ATSC – Advanced Television Systems Committee (HDTV Broadcast) – 8VSB Modulation – 8 level Vestigal Sideband Modulation, 6MhZ channel, 19.39 Mbps, Reed-Solomon Error Correction – Up to 1080p (1920x1080) Video Resolution – AC-3 (Dolby) Audio DVB – Digital Video Broadcast (Europe, Satellite) – Program Stream designed for Playback Media (DVD, Flash, HDD, etc.) MPEG-4 – Defined by ISO 14496 (1998) – Leverages MPEG-2 Standards for Program/Transport, Encode/Decode – Better Compression Rates (improved motion prediction for P,B frames), MPEG-4 Part-10 (H.264), e.g. Blu-Ray – Extensions for Digital Rights Management – Advanced Audio Encoding – Becoming More Widely Deployed for HD and Because of Lower Bit-Rate Transport Streams Sam Siewert 26 .
Recommended publications
  • Kulkarni Uta 2502M 11649.Pdf
    IMPLEMENTATION OF A FAST INTER-PREDICTION MODE DECISION IN H.264/AVC VIDEO ENCODER by AMRUTA KIRAN KULKARNI Presented to the Faculty of the Graduate School of The University of Texas at Arlington in Partial Fulfillment of the Requirements for the Degree of MASTER OF SCIENCE IN ELECTRICAL ENGINEERING THE UNIVERSITY OF TEXAS AT ARLINGTON May 2012 ACKNOWLEDGEMENTS First and foremost, I would like to take this opportunity to offer my gratitude to my supervisor, Dr. K.R. Rao, who invested his precious time in me and has been a constant support throughout my thesis with his patience and profound knowledge. His motivation and enthusiasm helped me in all the time of research and writing of this thesis. His advising and mentoring have helped me complete my thesis. Besides my advisor, I would like to thank the rest of my thesis committee. I am also very grateful to Dr. Dongil Han for his continuous technical advice and financial support. I would like to acknowledge my research group partner, Santosh Kumar Muniyappa, for all the valuable discussions that we had together. It helped me in building confidence and motivated towards completing the thesis. Also, I thank all other lab mates and friends who helped me get through two years of graduate school. Finally, my sincere gratitude and love go to my family. They have been my role model and have always showed me right way. Last but not the least; I would like to thank my husband Amey Mahajan for his emotional and moral support. April 20, 2012 ii ABSTRACT IMPLEMENTATION OF A FAST INTER-PREDICTION MODE DECISION IN H.264/AVC VIDEO ENCODER Amruta Kulkarni, M.S The University of Texas at Arlington, 2011 Supervising Professor: K.R.
    [Show full text]
  • A Survey Paper on Different Speech Compression Techniques
    Vol-2 Issue-5 2016 IJARIIE-ISSN (O)-2395-4396 A Survey Paper on Different Speech Compression Techniques Kanawade Pramila.R1, Prof. Gundal Shital.S2 1 M.E. Electronics, Department of Electronics Engineering, Amrutvahini College of Engineering, Sangamner, Maharashtra, India. 2 HOD in Electronics Department, Department of Electronics Engineering , Amrutvahini College of Engineering, Sangamner, Maharashtra, India. ABSTRACT This paper describes the different types of speech compression techniques. Speech compression can be divided into two main types such as lossless and lossy compression. This survey paper has been written with the help of different types of Waveform-based speech compression, Parametric-based speech compression, Hybrid based speech compression etc. Compression is nothing but reducing size of data with considering memory size. Speech compression means voiced signal compress for different application such as high quality database of speech signals, multimedia applications, music database and internet applications. Today speech compression is very useful in our life. The main purpose or aim of speech compression is to compress any type of audio that is transfer over the communication channel, because of the limited channel bandwidth and data storage capacity and low bit rate. The use of lossless and lossy techniques for speech compression means that reduced the numbers of bits in the original information. By the use of lossless data compression there is no loss in the original information but while using lossy data compression technique some numbers of bits are loss. Keyword: - Bit rate, Compression, Waveform-based speech compression, Parametric-based speech compression, Hybrid based speech compression. 1. INTRODUCTION -1 Speech compression is use in the encoding system.
    [Show full text]
  • 1. in the New Document Dialog Box, on the General Tab, for Type, Choose Flash Document, Then Click______
    1. In the New Document dialog box, on the General tab, for type, choose Flash Document, then click____________. 1. Tab 2. Ok 3. Delete 4. Save 2. Specify the export ________ for classes in the movie. 1. Frame 2. Class 3. Loading 4. Main 3. To help manage the files in a large application, flash MX professional 2004 supports the concept of _________. 1. Files 2. Projects 3. Flash 4. Player 4. In the AppName directory, create a subdirectory named_______. 1. Source 2. Flash documents 3. Source code 4. Classes 5. In Flash, most applications are _________ and include __________ user interfaces. 1. Visual, Graphical 2. Visual, Flash 3. Graphical, Flash 4. Visual, AppName 6. Test locally by opening ________ in your web browser. 1. AppName.fla 2. AppName.html 3. AppName.swf 4. AppName 7. The AppName directory will contain everything in our project, including _________ and _____________ . 1. Source code, Final input 2. Input, Output 3. Source code, Final output 4. Source code, everything 8. In the AppName directory, create a subdirectory named_______. 1. Compiled application 2. Deploy 3. Final output 4. Source code 9. Every Flash application must include at least one ______________. 1. Flash document 2. AppName 3. Deploy 4. Source 10. In the AppName/Source directory, create a subdirectory named __________. 1. Source 2. Com 3. Some domain 4. AppName 11. In the AppName/Source/Com directory, create a sub directory named ______ 1. Some domain 2. Com 3. AppName 4. Source 12. A project is group of related _________ that can be managed via the project panel in the flash.
    [Show full text]
  • Discrete Cosine Transform for 8X8 Blocks with CUDA
    Discrete Cosine Transform for 8x8 Blocks with CUDA Anton Obukhov [email protected] Alexander Kharlamov [email protected] October 2008 Document Change History Version Date Responsible Reason for Change 0.8 24.03.2008 Alexander Kharlamov Initial release 0.9 25.03.2008 Anton Obukhov Added algorithm-specific parts, fixed some issues 1.0 17.10.2008 Anton Obukhov Revised document structure October 2008 2 Abstract In this whitepaper the Discrete Cosine Transform (DCT) is discussed. The two-dimensional variation of the transform that operates on 8x8 blocks (DCT8x8) is widely used in image and video coding because it exhibits high signal decorrelation rates and can be easily implemented on the majority of contemporary computing architectures. The key feature of the DCT8x8 is that any pair of 8x8 blocks can be processed independently. This makes possible fully parallel implementation of DCT8x8 by definition. Most of CPU-based implementations of DCT8x8 are firmly adjusted for operating using fixed point arithmetic but still appear to be rather costly as soon as blocks are processed in the sequential order by the single ALU. Performing DCT8x8 computation on GPU using NVIDIA CUDA technology gives significant performance boost even compared to a modern CPU. The proposed approach is accompanied with the sample code “DCT8x8” in the NVIDIA CUDA SDK. October 2008 3 1. Introduction The Discrete Cosine Transform (DCT) is a Fourier-like transform, which was first proposed by Ahmed et al . (1974). While the Fourier Transform represents a signal as the mixture of sines and cosines, the Cosine Transform performs only the cosine-series expansion.
    [Show full text]
  • Analysis Application for H.264 Video Encoding
    IT 10 061 Examensarbete 30 hp November 2010 Analysis Application for H.264 Video Encoding Ying Wang Institutionen för informationsteknologi Department of Information Technology Abstract Analysis Application for H.264 Video Encoding Ying Wang Teknisk- naturvetenskaplig fakultet UTH-enheten A video analysis application ERANA264(Ericsson Research h.264 video Besöksadress: ANalysis Application) is developed in Ångströmlaboratoriet Lägerhyddsvägen 1 this project. Erana264 is a tool that Hus 4, Plan 0 analyzes H.264 encoded video bitstreams, extracts the encoding information and Postadress: parameters, analyzes them in different Box 536 751 21 Uppsala stages and displays the results in a user friendly way. The intention is that Telefon: such an application would be used during 018 – 471 30 03 development and testing of video codecs. Telefax: The work is implemented on top of 018 – 471 30 00 existing H.264 encoder/decoder source code (C/C++) developed at Ericsson Hemsida: Research. http://www.teknat.uu.se/student Erana264 consists of three layers. The first layer is the H.264 decoder previously developed in Ericsson Research. By using the decoder APIs, the information is extracted from the bitstream and is sent to the higher layers. The second layer visualizes the different decoding stages, uses overlay to display some macro block and picture level information and provides a set of play back functions. The third layer analyzes and presents the statistics of prominent parameters in video compression process, such as video quality measurements, motion vector distribution, picture bit distribution etc. Key words: H.264, Video compression, Bitstream analysis, Video encoding Handledare: Zhuangfei Wu and Clinton Priddle Ämnesgranskare: Cris Luengo Examinator: Anders Jansson IT10061 Tryckt av: Reprocentralen ITC Acknowledgements Fist of all, I am heartily thankful to my supervisors, Fred Wu and Clinton Priddle, whose encouragement, supervision and support from the preliminary to the concluding level enabled me to develop an understanding of the subject.
    [Show full text]
  • Lossless Compression of Audio Data
    CHAPTER 12 Lossless Compression of Audio Data ROBERT C. MAHER OVERVIEW Lossless data compression of digital audio signals is useful when it is necessary to minimize the storage space or transmission bandwidth of audio data while still maintaining archival quality. Available techniques for lossless audio compression, or lossless audio packing, generally employ an adaptive waveform predictor with a variable-rate entropy coding of the residual, such as Huffman or Golomb-Rice coding. The amount of data compression can vary considerably from one audio waveform to another, but ratios of less than 3 are typical. Several freeware, shareware, and proprietary commercial lossless audio packing programs are available. 12.1 INTRODUCTION The Internet is increasingly being used as a means to deliver audio content to end-users for en­ tertainment, education, and commerce. It is clearly advantageous to minimize the time required to download an audio data file and the storage capacity required to hold it. Moreover, the expec­ tations of end-users with regard to signal quality, number of audio channels, meta-data such as song lyrics, and similar additional features provide incentives to compress the audio data. 12.1.1 Background In the past decade there have been significant breakthroughs in audio data compression using lossy perceptual coding [1]. These techniques lower the bit rate required to represent the signal by establishing perceptual error criteria, meaning that a model of human hearing perception is Copyright 2003. Elsevier Science (USA). 255 AU rights reserved. 256 PART III / APPLICATIONS used to guide the elimination of excess bits that can be either reconstructed (redundancy in the signal) orignored (inaudible components in the signal).
    [Show full text]
  • The H.264 Advanced Video Coding (AVC) Standard
    Whitepaper: The H.264 Advanced Video Coding (AVC) Standard What It Means to Web Camera Performance Introduction A new generation of webcams is hitting the market that makes video conferencing a more lifelike experience for users, thanks to adoption of the breakthrough H.264 standard. This white paper explains some of the key benefits of H.264 encoding and why cameras with this technology should be on the shopping list of every business. The Need for Compression Today, Internet connection rates average in the range of a few megabits per second. While VGA video requires 147 megabits per second (Mbps) of data, full high definition (HD) 1080p video requires almost one gigabit per second of data, as illustrated in Table 1. Table 1. Display Resolution Format Comparison Format Horizontal Pixels Vertical Lines Pixels Megabits per second (Mbps) QVGA 320 240 76,800 37 VGA 640 480 307,200 147 720p 1280 720 921,600 442 1080p 1920 1080 2,073,600 995 Video Compression Techniques Digital video streams, especially at high definition (HD) resolution, represent huge amounts of data. In order to achieve real-time HD resolution over typical Internet connection bandwidths, video compression is required. The amount of compression required to transmit 1080p video over a three megabits per second link is 332:1! Video compression techniques use mathematical algorithms to reduce the amount of data needed to transmit or store video. Lossless Compression Lossless compression changes how data is stored without resulting in any loss of information. Zip files are losslessly compressed so that when they are unzipped, the original files are recovered.
    [Show full text]
  • Understanding Compression of Geospatial Raster Imagery
    Understanding Compression of Geospatial Raster Imagery Document Overview This document was created for the North Carolina Geographic Information and Coordinating Council (GICC), http://ncgicc.com, by the GIS Technical Advisory Committee (TAC). Its purpose is to serve as a best practice or guidance document for GIS professionals that are compressing raster images. This document only addresses compressing geospatial raster data and specifically aerial or orthorectified imagery. It does not address compressing LiDAR data. Compression Overview Compression is the process of making data more compact so it occupies less disk storage space. The primary benefit of compressing raster data is reduction in file size. An added benefit is greatly improved performance over a network, because the user is transferring less data from a server to an application; however, compressed data must be decompressed to display in GIS software. The result may be slower raster display in GIS software than data that is not compressed. Compressed data can also increase CPU requirements on the server or desktop. Glossary of Common Terms Raster is a spatial data model made of rows and columns of cells. Each cell contains an attribute value identifying its color and location coordinate. Geospatial raster data like satellite images and aerial photographs are typically larger on average than vector data (predominately points, lines, or polygons). Compression is the process of making a (raster) file smaller while preserving all or most of the data it contains. Imagery compression enables storage of more data (image files) on a disk than if they were uncompressed. Compression ratio is the amount or degree of reduction of an image's file size.
    [Show full text]
  • CALIFORNIA STATE UNIVERSITY, NORTHRIDGE Optimized AV1 Inter
    CALIFORNIA STATE UNIVERSITY, NORTHRIDGE Optimized AV1 Inter Prediction using Binary classification techniques A graduate project submitted in partial fulfillment of the requirements for the degree of Master of Science in Software Engineering by Alex Kit Romero May 2020 The graduate project of Alex Kit Romero is approved: ____________________________________ ____________ Dr. Katya Mkrtchyan Date ____________________________________ ____________ Dr. Kyle Dewey Date ____________________________________ ____________ Dr. John J. Noga, Chair Date California State University, Northridge ii Dedication This project is dedicated to all of the Computer Science professors that I have come in contact with other the years who have inspired and encouraged me to pursue a career in computer science. The words and wisdom of these professors are what pushed me to try harder and accomplish more than I ever thought possible. I would like to give a big thanks to the open source community and my fellow cohort of computer science co-workers for always being there with answers to my numerous questions and inquiries. Without their guidance and expertise, I could not have been successful. Lastly, I would like to thank my friends and family who have supported and uplifted me throughout the years. Thank you for believing in me and always telling me to never give up. iii Table of Contents Signature Page ................................................................................................................................ ii Dedication .....................................................................................................................................
    [Show full text]
  • Video Coding Standards
    Module 8 Video Coding Standards Version 2 ECE IIT, Kharagpur Lesson 23 MPEG-1 standards Version 2 ECE IIT, Kharagpur Lesson objectives At the end of this lesson, the students should be able to : 1. Enlist the major video coding standards 2. State the basic objectives of MPEG-1 standard. 3. Enlist the set of constrained parameters in MPEG-1 4. Define the I- P- and B-pictures 5. Present the hierarchical data structure of MPEG-1 6. Define the macroblock modes supported by MPEG-1 23.0 Introduction In lesson 21 and lesson 22, we studied how to perform motion estimation and thereby temporally predict the video frames to exploit significant temporal redundancies present in the video sequence. The error in temporal prediction is encoded by standard transform domain techniques like the DCT, followed by quantization and entropy coding to exploit the spatial and statistical redundancies and achieve significant video compression. The video codecs therefore follow a hybrid coding structure in which DPCM is adopted in temporal domain and DCT or other transform domain techniques in spatial domain. Efforts to standardize video data exchange via storage media or via communication networks are actively in progress since early 1980s. A number of international video and audio standardization activities started within the International Telephone Consultative Committee (CCITT), followed by the International Radio Consultative Committee (CCIR), and the International Standards Organization / International Electrotechnical Commission (ISO/IEC). An experts group, known as the Motion Pictures Expects Group (MPEG) was established in 1988 in the framework of the Joint ISO/IEC Technical Committee with an objective to develop standards for coded representation of moving pictures, associated audio, and their combination for storage and retrieval of digital media.
    [Show full text]
  • Lossy Audio Compression Identification
    2018 26th European Signal Processing Conference (EUSIPCO) Lossy Audio Compression Identification Bongjun Kim Zafar Rafii Northwestern University Gracenote Evanston, USA Emeryville, USA [email protected] zafar.rafi[email protected] Abstract—We propose a system which can estimate from an compression parameters from an audio signal, based on AAC, audio recording that has previously undergone lossy compression was presented in [3]. The first implementation of that work, the parameters used for the encoding, and therefore identify the based on MP3, was then proposed in [4]. The idea was to corresponding lossy coding format. The system analyzes the audio signal and searches for the compression parameters and framing search for the compression parameters and framing conditions conditions which match those used for the encoding. In particular, which match those used for the encoding, by measuring traces we propose a new metric for measuring traces of compression of compression in the audio signal, which typically correspond which is robust to variations in the audio content and a new to time-frequency coefficients quantized to zero. method for combining the estimates from multiple audio blocks The first work to investigate alterations, such as deletion, in- which can refine the results. We evaluated this system with audio excerpts from songs and movies, compressed into various coding sertion, or substitution, in audio signals which have undergone formats, using different bit rates, and captured digitally as well lossy compression, namely MP3, was presented in [5]. The as through analog transfer. Results showed that our system can idea was to measure traces of compression in the signal along identify the correct format in almost all cases, even at high bit time and detect discontinuities in the estimated framing.
    [Show full text]
  • CS 1St Year: M&A Types of Compression: the Two Types of Compression Are: Lossy Compression
    CS 1st Year: M&A Types of compression: The two types of compression are: Lossy Compression - where data bytes are removed from the file. This results in a smaller file, but also lower quality. It makes use of data redundancies and human perception – for example, removing data that cannot be perceived by humans. So whilst quality might be affected, the substance of the file is still present. Lossy compression would be commonly used over the internet, where large files present a problem. An example of lossy compression is “mp3” compression, which removes wavelength extremes which are out of the hearing range of normal people. MP3 has a compression ratio of 11:1. Another example would be JPEG (Joint Photographics Expert Group), which is used to compress images. JPEG works by grouping pixels of an image which have similar colour or brightness, and changing them all to a uniform, “average” colour, and then replaces the similar pixels with codes. The Lossy compression method eliminates some amount of data that is not noticeable. This technique does not allow a file to restore in its original form but significantly reduces the size. The lossy compression technique is beneficial if the quality of the data is not your priority. It slightly degrades the quality of the file or data but is convenient when one wants to send or store the data. This type of data compression is used for organic data like audio signals and images. Lossy Compression Technique Transform coding: This method transforms the pixels which are correlated in a representation into disassociated pixels.
    [Show full text]