An Overview of Core Coding Tools in the AV1 Video Codec
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Data Compression: Dictionary-Based Coding 2 / 37 Dictionary-Based Coding Dictionary-Based Coding
Dictionary-based Coding already coded not yet coded search buffer look-ahead buffer cursor (N symbols) (L symbols) We know the past but cannot control it. We control the future but... Last Lecture Last Lecture: Predictive Lossless Coding Predictive Lossless Coding Simple and effective way to exploit dependencies between neighboring symbols / samples Optimal predictor: Conditional mean (requires storage of large tables) Affine and Linear Prediction Simple structure, low-complex implementation possible Optimal prediction parameters are given by solution of Yule-Walker equations Works very well for real signals (e.g., audio, images, ...) Efficient Lossless Coding for Real-World Signals Affine/linear prediction (often: block-adaptive choice of prediction parameters) Entropy coding of prediction errors (e.g., arithmetic coding) Using marginal pmf often already yields good results Can be improved by using conditional pmfs (with simple conditions) Heiko Schwarz (Freie Universität Berlin) — Data Compression: Dictionary-based Coding 2 / 37 Dictionary-based Coding Dictionary-Based Coding Coding of Text Files Very high amount of dependencies Affine prediction does not work (requires linear dependencies) Higher-order conditional coding should work well, but is way to complex (memory) Alternative: Do not code single characters, but words or phrases Example: English Texts Oxford English Dictionary lists less than 230 000 words (including obsolete words) On average, a word contains about 6 characters Average codeword length per character would be limited by 1 -
Video Codec Requirements and Evaluation Methodology
Video Codec Requirements 47pt 30pt and Evaluation Methodology Color::white : LT Medium Font to be used by customers and : Arial www.huawei.com draft-filippov-netvc-requirements-01 Alexey Filippov, Huawei Technologies 35pt Contents Font to be used by customers and partners : • An overview of applications • Requirements 18pt • Evaluation methodology Font to be used by customers • Conclusions and partners : Slide 2 Page 2 35pt Applications Font to be used by customers and partners : • Internet Protocol Television (IPTV) • Video conferencing 18pt • Video sharing Font to be used by customers • Screencasting and partners : • Game streaming • Video monitoring / surveillance Slide 3 35pt Internet Protocol Television (IPTV) Font to be used by customers and partners : • Basic requirements: . Random access to pictures 18pt Random Access Period (RAP) should be kept small enough (approximately, 1-15 seconds); Font to be used by customers . Temporal (frame-rate) scalability; and partners : . Error robustness • Optional requirements: . resolution and quality (SNR) scalability Slide 4 35pt Internet Protocol Television (IPTV) Font to be used by customers and partners : Resolution Frame-rate, fps Picture access mode 2160p (4K),3840x2160 60 RA 18pt 1080p, 1920x1080 24, 50, 60 RA 1080i, 1920x1080 30 (60 fields per second) RA Font to be used by customers and partners : 720p, 1280x720 50, 60 RA 576p (EDTV), 720x576 25, 50 RA 576i (SDTV), 720x576 25, 30 RA 480p (EDTV), 720x480 50, 60 RA 480i (SDTV), 720x480 25, 30 RA Slide 5 35pt Video conferencing Font to be used by customers and partners : • Basic requirements: . Delay should be kept as low as possible 18pt The preferable and maximum delay values should be less than 100 ms and 350 ms, respectively Font to be used by customers . -
1. in the New Document Dialog Box, on the General Tab, for Type, Choose Flash Document, Then Click______
1. In the New Document dialog box, on the General tab, for type, choose Flash Document, then click____________. 1. Tab 2. Ok 3. Delete 4. Save 2. Specify the export ________ for classes in the movie. 1. Frame 2. Class 3. Loading 4. Main 3. To help manage the files in a large application, flash MX professional 2004 supports the concept of _________. 1. Files 2. Projects 3. Flash 4. Player 4. In the AppName directory, create a subdirectory named_______. 1. Source 2. Flash documents 3. Source code 4. Classes 5. In Flash, most applications are _________ and include __________ user interfaces. 1. Visual, Graphical 2. Visual, Flash 3. Graphical, Flash 4. Visual, AppName 6. Test locally by opening ________ in your web browser. 1. AppName.fla 2. AppName.html 3. AppName.swf 4. AppName 7. The AppName directory will contain everything in our project, including _________ and _____________ . 1. Source code, Final input 2. Input, Output 3. Source code, Final output 4. Source code, everything 8. In the AppName directory, create a subdirectory named_______. 1. Compiled application 2. Deploy 3. Final output 4. Source code 9. Every Flash application must include at least one ______________. 1. Flash document 2. AppName 3. Deploy 4. Source 10. In the AppName/Source directory, create a subdirectory named __________. 1. Source 2. Com 3. Some domain 4. AppName 11. In the AppName/Source/Com directory, create a sub directory named ______ 1. Some domain 2. Com 3. AppName 4. Source 12. A project is group of related _________ that can be managed via the project panel in the flash. -
Video Coding Standards
Video coding standards Video signals represent sequences of images or frames which can be transmitted with a rate from 15 to 60 frames per second (fps), that provides the illusion of motion in the displayed signal. Unlike images they contain the so-called temporal redundancy. Temporal redundancy arises from repeated objects in consecutive frames of the video sequence. Such objects can remain, they can move horizontally, vertically, or any combination of directions (translation movement), they can fade in and out, and they can disappear from the image as they move out of view. Temporal redundancy Motion compensation A motion compensation technique is used to compensate the temporal redundancy of video sequence. The main idea of the this method is to predict the displacement of group of pixels (usually block of pixels) from their position in the previous frame. Information about this displacement is represented by motion vectors which are transmitted together with the DCT coded difference between the predicted and the original images. Motion compensation Image VLC DCT Quantizer Buffer - Dequantizer Coded DCT coefficients. IDCT Motion compensated predictor Motion Motion vectors estimation Decoded VL image Buffer Dequantizer IDCT decoder Motion compensated predictor Motion vectors Motion compensation Previous frame Current frame Set of macroblocks of the previous frame used to predict the selected macroblock of the current frame Motion compensation Previous frame Current frame Macroblocks of previous frame used to predict current frame Motion compensation Each 16x16 pixel macroblock in the current frame is compared with a set of macroblocks in the previous frame to determine the one that best predicts the current macroblock. -
Annual Report 2016
ANNUAL REPORT 2016 PUNJABI UNIVERSITY, PATIALA © Punjabi University, Patiala (Established under Punjab Act No. 35 of 1961) Editor Dr. Shivani Thakar Asst. Professor (English) Department of Distance Education, Punjabi University, Patiala Laser Type Setting : Kakkar Computer, N.K. Road, Patiala Published by Dr. Manjit Singh Nijjar, Registrar, Punjabi University, Patiala and Printed at Kakkar Computer, Patiala :{Bhtof;Nh X[Bh nk;k wjbk ñ Ò uT[gd/ Ò ftfdnk thukoh sK goT[gekoh Ò iK gzu ok;h sK shoE tk;h Ò ñ Ò x[zxo{ tki? i/ wB[ bkr? Ò sT[ iw[ ejk eo/ w' f;T[ nkr? Ò ñ Ò ojkT[.. nk; fBok;h sT[ ;zfBnk;h Ò iK is[ i'rh sK ekfJnk G'rh Ò ò Ò dfJnk fdrzpo[ d/j phukoh Ò nkfg wo? ntok Bj wkoh Ò ó Ò J/e[ s{ j'fo t/; pj[s/o/.. BkBe[ ikD? u'i B s/o/ Ò ô Ò òõ Ò (;qh r[o{ rqzE ;kfjp, gzBk óôù) English Translation of University Dhuni True learning induces in the mind service of mankind. One subduing the five passions has truly taken abode at holy bathing-spots (1) The mind attuned to the infinite is the true singing of ankle-bells in ritual dances. With this how dare Yama intimidate me in the hereafter ? (Pause 1) One renouncing desire is the true Sanayasi. From continence comes true joy of living in the body (2) One contemplating to subdue the flesh is the truly Compassionate Jain ascetic. Such a one subduing the self, forbears harming others. (3) Thou Lord, art one and Sole. -
On Audio-Visual File Formats
On Audio-Visual File Formats Summary • digital audio and digital video • container, codec, raw data • different formats for different purposes Reto Kromer • AV Preservation by reto.ch • audio-visual data transformations Film Preservation and Restoration Hyderabad, India 8–15 December 2019 1 2 Digital Audio • sampling Digital Audio • quantisation 3 4 Sampling • 44.1 kHz • 48 kHz • 96 kHz • 192 kHz digitisation = sampling + quantisation 5 6 Quantisation • 16 bit (216 = 65 536) • 24 bit (224 = 16 777 216) • 32 bit (232 = 4 294 967 296) Digital Video 7 8 Digital Video Resolution • resolution • SD 480i / SD 576i • bit depth • HD 720p / HD 1080i • linear, power, logarithmic • 2K / HD 1080p • colour model • 4K / UHD-1 • chroma subsampling • 8K / UHD-2 • illuminant 9 10 Bit Depth Linear, Power, Logarithmic • 8 bit (28 = 256) «medium grey» • 10 bit (210 = 1 024) • linear: 18% • 12 bit (212 = 4 096) • power: 50% • 16 bit (216 = 65 536) • logarithmic: 50% • 24 bit (224 = 16 777 216) 11 12 Colour Model • XYZ, L*a*b* • RGB / R′G′B′ / CMY / C′M′Y′ • Y′IQ / Y′UV / Y′DBDR • Y′CBCR / Y′COCG • Y′PBPR 13 14 15 16 17 18 RGB24 00000000 11111111 00000000 00000000 00000000 00000000 11111111 00000000 00000000 00000000 00000000 11111111 00000000 11111111 11111111 11111111 11111111 00000000 11111111 11111111 11111111 11111111 00000000 11111111 19 20 Compression Uncompressed • uncompressed + data simpler to process • lossless compression + software runs faster • lossy compression – bigger files • chroma subsampling – slower writing, transmission and reading • born -
The Discrete Cosine Transform (DCT): Theory and Application
The Discrete Cosine Transform (DCT): 1 Theory and Application Syed Ali Khayam Department of Electrical & Computer Engineering Michigan State University March 10th 2003 1 This document is intended to be tutorial in nature. No prior knowledge of image processing concepts is assumed. Interested readers should follow the references for advanced material on DCT. ECE 802 – 602: Information Theory and Coding Seminar 1 – The Discrete Cosine Transform: Theory and Application 1. Introduction Transform coding constitutes an integral component of contemporary image/video processing applications. Transform coding relies on the premise that pixels in an image exhibit a certain level of correlation with their neighboring pixels. Similarly in a video transmission system, adjacent pixels in consecutive frames2 show very high correlation. Consequently, these correlations can be exploited to predict the value of a pixel from its respective neighbors. A transformation is, therefore, defined to map this spatial (correlated) data into transformed (uncorrelated) coefficients. Clearly, the transformation should utilize the fact that the information content of an individual pixel is relatively small i.e., to a large extent visual contribution of a pixel can be predicted using its neighbors. A typical image/video transmission system is outlined in Figure 1. The objective of the source encoder is to exploit the redundancies in image data to provide compression. In other words, the source encoder reduces the entropy, which in our case means decrease in the average number of bits required to represent the image. On the contrary, the channel encoder adds redundancy to the output of the source encoder in order to enhance the reliability of the transmission. -
Amphion Video Codecs in an AI World
Video Codecs In An AI World Dr Doug Ridge Amphion Semiconductor The Proliferance of Video in Networks • Video produces huge volumes of data • According to Cisco “By 2021 video will make up 82% of network traffic” • Equals 3.3 zetabytes of data annually • 3.3 x 1021 bytes • 3.3 billion terabytes AI Engines Overview • Example AI network types include Artificial Neural Networks, Spiking Neural Networks and Self-Organizing Feature Maps • Learning and processing are automated • Processing • AI engines designed for processing huge amounts of data quickly • High degree of parallelism • Much greater performance and significantly lower power than CPU/GPU solutions • Learning and Inference • AI ‘learns’ from masses of data presented • Data presented as Input-Desired Output or as unmarked input for self-organization • AI network can start processing once initial training takes place Typical Applications of AI • Reduce data to be sorted manually • Example application in analysis of mammograms • 99% reduction in images send for analysis by specialist • Reduction in workload resulted in huge reduction in wrong diagnoses • Aid in decision making • Example application in traffic monitoring • Identify areas of interest in imagery to focus attention • No definitive decision made by AI engine • Perform decision making independently • Example application in security video surveillance • Alerts and alarms triggered by AI analysis of behaviours in imagery • Reduction in false alarms and more attention paid to alerts by security staff Typical Video Surveillance -
Arithmetic Coding
Arithmetic Coding Arithmetic coding is the most efficient method to code symbols according to the probability of their occurrence. The average code length corresponds exactly to the possible minimum given by information theory. Deviations which are caused by the bit-resolution of binary code trees do not exist. In contrast to a binary Huffman code tree the arithmetic coding offers a clearly better compression rate. Its implementation is more complex on the other hand. In arithmetic coding, a message is encoded as a real number in an interval from one to zero. Arithmetic coding typically has a better compression ratio than Huffman coding, as it produces a single symbol rather than several separate codewords. Arithmetic coding differs from other forms of entropy encoding such as Huffman coding in that rather than separating the input into component symbols and replacing each with a code, arithmetic coding encodes the entire message into a single number, a fraction n where (0.0 ≤ n < 1.0) Arithmetic coding is a lossless coding technique. There are a few disadvantages of arithmetic coding. One is that the whole codeword must be received to start decoding the symbols, and if there is a corrupt bit in the codeword, the entire message could become corrupt. Another is that there is a limit to the precision of the number which can be encoded, thus limiting the number of symbols to encode within a codeword. There also exist many patents upon arithmetic coding, so the use of some of the algorithms also call upon royalty fees. Arithmetic coding is part of the JPEG data format. -
A Practical Approach to Spatiotemporal Data Compression
A Practical Approach to Spatiotemporal Data Compres- sion Niall H. Robinson1, Rachel Prudden1 & Alberto Arribas1 1Informatics Lab, Met Office, Exeter, UK. Datasets representing the world around us are becoming ever more unwieldy as data vol- umes grow. This is largely due to increased measurement and modelling resolution, but the problem is often exacerbated when data are stored at spuriously high precisions. In an effort to facilitate analysis of these datasets, computationally intensive calculations are increasingly being performed on specialised remote servers before the reduced data are transferred to the consumer. Due to bandwidth limitations, this often means data are displayed as simple 2D data visualisations, such as scatter plots or images. We present here a novel way to efficiently encode and transmit 4D data fields on-demand so that they can be locally visualised and interrogated. This nascent “4D video” format allows us to more flexibly move the bound- ary between data server and consumer client. However, it has applications beyond purely scientific visualisation, in the transmission of data to virtual and augmented reality. arXiv:1604.03688v2 [cs.MM] 27 Apr 2016 With the rise of high resolution environmental measurements and simulation, extremely large scientific datasets are becoming increasingly ubiquitous. The scientific community is in the pro- cess of learning how to efficiently make use of these unwieldy datasets. Increasingly, people are interacting with this data via relatively thin clients, with data analysis and storage being managed by a remote server. The web browser is emerging as a useful interface which allows intensive 1 operations to be performed on a remote bespoke analysis server, but with the resultant information visualised and interrogated locally on the client1, 2. -
(A/V Codecs) REDCODE RAW (.R3D) ARRIRAW
What is a Codec? Codec is a portmanteau of either "Compressor-Decompressor" or "Coder-Decoder," which describes a device or program capable of performing transformations on a data stream or signal. Codecs encode a stream or signal for transmission, storage or encryption and decode it for viewing or editing. Codecs are often used in videoconferencing and streaming media solutions. A video codec converts analog video signals from a video camera into digital signals for transmission. It then converts the digital signals back to analog for display. An audio codec converts analog audio signals from a microphone into digital signals for transmission. It then converts the digital signals back to analog for playing. The raw encoded form of audio and video data is often called essence, to distinguish it from the metadata information that together make up the information content of the stream and any "wrapper" data that is then added to aid access to or improve the robustness of the stream. Most codecs are lossy, in order to get a reasonably small file size. There are lossless codecs as well, but for most purposes the almost imperceptible increase in quality is not worth the considerable increase in data size. The main exception is if the data will undergo more processing in the future, in which case the repeated lossy encoding would damage the eventual quality too much. Many multimedia data streams need to contain both audio and video data, and often some form of metadata that permits synchronization of the audio and video. Each of these three streams may be handled by different programs, processes, or hardware; but for the multimedia data stream to be useful in stored or transmitted form, they must be encapsulated together in a container format. -
Encoding H.264 Video for Streaming and Progressive Download
W4: KEY ENCODING SKILLS, TECHNOLOGIES TECHNIQUES STREAMING MEDIA EAST - 2019 Jan Ozer www.streaminglearningcenter.com [email protected]/ 276-235-8542 @janozer Agenda • Introduction • Lesson 5: How to build encoding • Lesson 1: Delivering to Computers, ladder with objective quality metrics Mobile, OTT, and Smart TVs • Lesson 6: Current status of CMAF • Lesson 2: Codec review • Lesson 7: Delivering with dynamic • Lesson 3: Delivering HEVC over and static packaging HLS • Lesson 4: Per-title encoding Lesson 1: Delivering to Computers, Mobile, OTT, and Smart TVs • Computers • Mobile • OTT • Smart TVs Choosing an ABR Format for Computers • Can be DASH or HLS • Factors • Off-the-shelf player vendor (JW Player, Bitmovin, THEOPlayer, etc.) • Encoding/transcoding vendor Choosing an ABR Format for iOS • Native support (playback in the browser) • HTTP Live Streaming • Playback via an app • Any, including DASH, Smooth, HDS or RTMP Dynamic Streaming iOS Media Support Native App Codecs H.264 (High, Level 4.2), HEVC Any (Main10, Level 5 high) ABR formats HLS Any DRM FairPlay Any Captions CEA-608/708, WebVTT, IMSC1 Any HDR HDR10, DolbyVision ? http://bit.ly/hls_spec_2017 iOS Encoding Ladders H.264 HEVC http://bit.ly/hls_spec_2017 HEVC Hardware Support - iOS 3 % bit.ly/mobile_HEVC http://bit.ly/glob_med_2019 Android: Codec and ABR Format Support Codecs ABR VP8 (2.3+) • Multiple codecs and ABR H.264 (3+) HLS (3+) technologies • Serious cautions about HLS • DASH now close to 97% • HEVC VP9 (4.4+) DASH 4.4+ Via MSE • Main Profile Level 3 – mobile HEVC (5+)