Efficient AV1 Video Coding Using a Multi-Layer Framework
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Kulkarni Uta 2502M 11649.Pdf
IMPLEMENTATION OF A FAST INTER-PREDICTION MODE DECISION IN H.264/AVC VIDEO ENCODER by AMRUTA KIRAN KULKARNI Presented to the Faculty of the Graduate School of The University of Texas at Arlington in Partial Fulfillment of the Requirements for the Degree of MASTER OF SCIENCE IN ELECTRICAL ENGINEERING THE UNIVERSITY OF TEXAS AT ARLINGTON May 2012 ACKNOWLEDGEMENTS First and foremost, I would like to take this opportunity to offer my gratitude to my supervisor, Dr. K.R. Rao, who invested his precious time in me and has been a constant support throughout my thesis with his patience and profound knowledge. His motivation and enthusiasm helped me in all the time of research and writing of this thesis. His advising and mentoring have helped me complete my thesis. Besides my advisor, I would like to thank the rest of my thesis committee. I am also very grateful to Dr. Dongil Han for his continuous technical advice and financial support. I would like to acknowledge my research group partner, Santosh Kumar Muniyappa, for all the valuable discussions that we had together. It helped me in building confidence and motivated towards completing the thesis. Also, I thank all other lab mates and friends who helped me get through two years of graduate school. Finally, my sincere gratitude and love go to my family. They have been my role model and have always showed me right way. Last but not the least; I would like to thank my husband Amey Mahajan for his emotional and moral support. April 20, 2012 ii ABSTRACT IMPLEMENTATION OF A FAST INTER-PREDICTION MODE DECISION IN H.264/AVC VIDEO ENCODER Amruta Kulkarni, M.S The University of Texas at Arlington, 2011 Supervising Professor: K.R. -
Video Codec Requirements and Evaluation Methodology
Video Codec Requirements 47pt 30pt and Evaluation Methodology Color::white : LT Medium Font to be used by customers and : Arial www.huawei.com draft-filippov-netvc-requirements-01 Alexey Filippov, Huawei Technologies 35pt Contents Font to be used by customers and partners : • An overview of applications • Requirements 18pt • Evaluation methodology Font to be used by customers • Conclusions and partners : Slide 2 Page 2 35pt Applications Font to be used by customers and partners : • Internet Protocol Television (IPTV) • Video conferencing 18pt • Video sharing Font to be used by customers • Screencasting and partners : • Game streaming • Video monitoring / surveillance Slide 3 35pt Internet Protocol Television (IPTV) Font to be used by customers and partners : • Basic requirements: . Random access to pictures 18pt Random Access Period (RAP) should be kept small enough (approximately, 1-15 seconds); Font to be used by customers . Temporal (frame-rate) scalability; and partners : . Error robustness • Optional requirements: . resolution and quality (SNR) scalability Slide 4 35pt Internet Protocol Television (IPTV) Font to be used by customers and partners : Resolution Frame-rate, fps Picture access mode 2160p (4K),3840x2160 60 RA 18pt 1080p, 1920x1080 24, 50, 60 RA 1080i, 1920x1080 30 (60 fields per second) RA Font to be used by customers and partners : 720p, 1280x720 50, 60 RA 576p (EDTV), 720x576 25, 50 RA 576i (SDTV), 720x576 25, 30 RA 480p (EDTV), 720x480 50, 60 RA 480i (SDTV), 720x480 25, 30 RA Slide 5 35pt Video conferencing Font to be used by customers and partners : • Basic requirements: . Delay should be kept as low as possible 18pt The preferable and maximum delay values should be less than 100 ms and 350 ms, respectively Font to be used by customers . -
Microsoft Powerpoint
Development of Multimedia WebApp on Tizen Platform 1. HTML Multimedia 2. Multimedia Playing with HTML5 Tags (1) HTML5 Video (2) HTML5 Audio (3) HTML Pulg-ins (4) HTML YouTube (5) Accessing Media Streams and Playing (6) Multimedia Contents Mgmt (7) Capturing Images 3. Multimedia Processing Web Device API Multimedia WepApp on Tizen - 1 - 1. HTML Multimedia • What is Multimedia ? − Multimedia comes in many different formats. It can be almost anything you can hear or see. − Examples : Pictures, music, sound, videos, records, films, animations, and more. − Web pages often contain multimedia elements of different types and formats. • Multimedia Formats − Multimedia elements (like sounds or videos) are stored in media files. − The most common way to discover the type of a file, is to look at the file extension. ⇔ When a browser sees the file extension .htm or .html, it will treat the file as an HTML file. ⇔ The .xml extension indicates an XML file, and the .css extension indicates a style sheet file. ⇔ Pictures are recognized by extensions like .gif, .png and .jpg. − Multimedia files also have their own formats and different extensions like: .swf, .wav, .mp3, .mp4, .mpg, .wmv, and .avi. Multimedia WepApp on Tizen - 2 - 2. Multimedia Playing with HTML5 Tags (1) HTML5 Video • Some of the popular video container formats include the following: Audio Video Interleave (.avi) Flash Video (.flv) MPEG 4 (.mp4) Matroska (.mkv) Ogg (.ogv) • Browser Support Multimedia WepApp on Tizen - 3 - • Common Video Format Format File Description .mpg MPEG. Developed by the Moving Pictures Expert Group. The first popular video format on the MPEG .mpeg web. -
On Audio-Visual File Formats
On Audio-Visual File Formats Summary • digital audio and digital video • container, codec, raw data • different formats for different purposes Reto Kromer • AV Preservation by reto.ch • audio-visual data transformations Film Preservation and Restoration Hyderabad, India 8–15 December 2019 1 2 Digital Audio • sampling Digital Audio • quantisation 3 4 Sampling • 44.1 kHz • 48 kHz • 96 kHz • 192 kHz digitisation = sampling + quantisation 5 6 Quantisation • 16 bit (216 = 65 536) • 24 bit (224 = 16 777 216) • 32 bit (232 = 4 294 967 296) Digital Video 7 8 Digital Video Resolution • resolution • SD 480i / SD 576i • bit depth • HD 720p / HD 1080i • linear, power, logarithmic • 2K / HD 1080p • colour model • 4K / UHD-1 • chroma subsampling • 8K / UHD-2 • illuminant 9 10 Bit Depth Linear, Power, Logarithmic • 8 bit (28 = 256) «medium grey» • 10 bit (210 = 1 024) • linear: 18% • 12 bit (212 = 4 096) • power: 50% • 16 bit (216 = 65 536) • logarithmic: 50% • 24 bit (224 = 16 777 216) 11 12 Colour Model • XYZ, L*a*b* • RGB / R′G′B′ / CMY / C′M′Y′ • Y′IQ / Y′UV / Y′DBDR • Y′CBCR / Y′COCG • Y′PBPR 13 14 15 16 17 18 RGB24 00000000 11111111 00000000 00000000 00000000 00000000 11111111 00000000 00000000 00000000 00000000 11111111 00000000 11111111 11111111 11111111 11111111 00000000 11111111 11111111 11111111 11111111 00000000 11111111 19 20 Compression Uncompressed • uncompressed + data simpler to process • lossless compression + software runs faster • lossy compression – bigger files • chroma subsampling – slower writing, transmission and reading • born -
Pre-Roll & Mid-Roll Video
Pre-roll & Mid-roll Video 1/2 THIRD PARTY ALL ASSETS BELOW ARE REQUIRED VAST SPECIFICATIONS TO BE PRESENT IN THE VAST TAG Not available for live stream sponsorships or feature sponsorships. All assets for sponsored Bit rate Codecs accepted Min dimensions Max file size Use cases content must use the "Network 10 Hosted Video In-Stream Ad with Companion" specifications. Mezzanine File 15–30 Mbps H.264 1920x1080 1.7 GB Required for SSAI Aspect ratio Format (High profile) Environments 16:9 Video will auto-scale correctly Frame Rate: 24 :15 – 4.5MB High Codec Constant frame rate only 2,100 kbps H.264 Mezzanine File - .mov +/- 50 kbps (High profile) 1024x576 :30 – 9MB bandwidth (H.264 High Profile) No de-interlacing with :18 – 18MB users no frame blending mp4 (high profile) :15 – 3.5MB Standard asset Remove any pull-down 1,500 kbps H.264 +/- 50 kbps (High profile) 960x540 :30 – 7MB for most users webm (VP8 or VP9) added for broadcast :18 – 14MB and pre roll Duration Audio :15 – 1MB Low 750 kbps H.264 768x432 :30 – 2MB bandwidth Network 10 accepts a variety of length Mezzanine file: 2 Channels only, AAC +/- 50 kbps (High profile) :18 – 4MB users creatives, standards include :6*, :15, :30, Codec, 192 KBPS minimum, 16 or 24 bit Available on :60*, :90*. only, 48 kHz Sample Rate. :15 – 4.5MB High 375 kbps H.264 Any tag submitted must contain creative mp4 assets: 2 Channels only, AAC Codec, +/- 50 kbps (High profile) 640x360 :30 – 9MB bandwidth of all the same length. 192 KBPS minimum, 16 or 24 bit only, 48 :18 – 18MB users kHz Sample Rate. -
Amphion Video Codecs in an AI World
Video Codecs In An AI World Dr Doug Ridge Amphion Semiconductor The Proliferance of Video in Networks • Video produces huge volumes of data • According to Cisco “By 2021 video will make up 82% of network traffic” • Equals 3.3 zetabytes of data annually • 3.3 x 1021 bytes • 3.3 billion terabytes AI Engines Overview • Example AI network types include Artificial Neural Networks, Spiking Neural Networks and Self-Organizing Feature Maps • Learning and processing are automated • Processing • AI engines designed for processing huge amounts of data quickly • High degree of parallelism • Much greater performance and significantly lower power than CPU/GPU solutions • Learning and Inference • AI ‘learns’ from masses of data presented • Data presented as Input-Desired Output or as unmarked input for self-organization • AI network can start processing once initial training takes place Typical Applications of AI • Reduce data to be sorted manually • Example application in analysis of mammograms • 99% reduction in images send for analysis by specialist • Reduction in workload resulted in huge reduction in wrong diagnoses • Aid in decision making • Example application in traffic monitoring • Identify areas of interest in imagery to focus attention • No definitive decision made by AI engine • Perform decision making independently • Example application in security video surveillance • Alerts and alarms triggered by AI analysis of behaviours in imagery • Reduction in false alarms and more attention paid to alerts by security staff Typical Video Surveillance -
A Practical Approach to Spatiotemporal Data Compression
A Practical Approach to Spatiotemporal Data Compres- sion Niall H. Robinson1, Rachel Prudden1 & Alberto Arribas1 1Informatics Lab, Met Office, Exeter, UK. Datasets representing the world around us are becoming ever more unwieldy as data vol- umes grow. This is largely due to increased measurement and modelling resolution, but the problem is often exacerbated when data are stored at spuriously high precisions. In an effort to facilitate analysis of these datasets, computationally intensive calculations are increasingly being performed on specialised remote servers before the reduced data are transferred to the consumer. Due to bandwidth limitations, this often means data are displayed as simple 2D data visualisations, such as scatter plots or images. We present here a novel way to efficiently encode and transmit 4D data fields on-demand so that they can be locally visualised and interrogated. This nascent “4D video” format allows us to more flexibly move the bound- ary between data server and consumer client. However, it has applications beyond purely scientific visualisation, in the transmission of data to virtual and augmented reality. arXiv:1604.03688v2 [cs.MM] 27 Apr 2016 With the rise of high resolution environmental measurements and simulation, extremely large scientific datasets are becoming increasingly ubiquitous. The scientific community is in the pro- cess of learning how to efficiently make use of these unwieldy datasets. Increasingly, people are interacting with this data via relatively thin clients, with data analysis and storage being managed by a remote server. The web browser is emerging as a useful interface which allows intensive 1 operations to be performed on a remote bespoke analysis server, but with the resultant information visualised and interrogated locally on the client1, 2. -
(A/V Codecs) REDCODE RAW (.R3D) ARRIRAW
What is a Codec? Codec is a portmanteau of either "Compressor-Decompressor" or "Coder-Decoder," which describes a device or program capable of performing transformations on a data stream or signal. Codecs encode a stream or signal for transmission, storage or encryption and decode it for viewing or editing. Codecs are often used in videoconferencing and streaming media solutions. A video codec converts analog video signals from a video camera into digital signals for transmission. It then converts the digital signals back to analog for display. An audio codec converts analog audio signals from a microphone into digital signals for transmission. It then converts the digital signals back to analog for playing. The raw encoded form of audio and video data is often called essence, to distinguish it from the metadata information that together make up the information content of the stream and any "wrapper" data that is then added to aid access to or improve the robustness of the stream. Most codecs are lossy, in order to get a reasonably small file size. There are lossless codecs as well, but for most purposes the almost imperceptible increase in quality is not worth the considerable increase in data size. The main exception is if the data will undergo more processing in the future, in which case the repeated lossy encoding would damage the eventual quality too much. Many multimedia data streams need to contain both audio and video data, and often some form of metadata that permits synchronization of the audio and video. Each of these three streams may be handled by different programs, processes, or hardware; but for the multimedia data stream to be useful in stored or transmitted form, they must be encapsulated together in a container format. -
Multimedia Compression Techniques for Streaming
International Journal of Innovative Technology and Exploring Engineering (IJITEE) ISSN: 2278-3075, Volume-8 Issue-12, October 2019 Multimedia Compression Techniques for Streaming Preethal Rao, Krishna Prakasha K, Vasundhara Acharya most of the audio codes like MP3, AAC etc., are lossy as Abstract: With the growing popularity of streaming content, audio files are originally small in size and thus need not have streaming platforms have emerged that offer content in more compression. In lossless technique, the file size will be resolutions of 4k, 2k, HD etc. Some regions of the world face a reduced to the maximum possibility and thus quality might be terrible network reception. Delivering content and a pleasant compromised more when compared to lossless technique. viewing experience to the users of such locations becomes a The popular codecs like MPEG-2, H.264, H.265 etc., make challenge. audio/video streaming at available network speeds is just not feasible for people at those locations. The only way is to use of this. FLAC, ALAC are some audio codecs which use reduce the data footprint of the concerned audio/video without lossy technique for compression of large audio files. The goal compromising the quality. For this purpose, there exists of this paper is to identify existing techniques in audio-video algorithms and techniques that attempt to realize the same. compression for transmission and carry out a comparative Fortunately, the field of compression is an active one when it analysis of the techniques based on certain parameters. The comes to content delivering. With a lot of algorithms in the play, side outcome would be a program that would stream the which one actually delivers content while putting less strain on the audio/video file of our choice while the main outcome is users' network bandwidth? This paper carries out an extensive finding out the compression technique that performs the best analysis of present popular algorithms to come to the conclusion of the best algorithm for streaming data. -
Opus, a Free, High-Quality Speech and Audio Codec
Opus, a free, high-quality speech and audio codec Jean-Marc Valin, Koen Vos, Timothy B. Terriberry, Gregory Maxwell 29 January 2014 Xiph.Org & Mozilla What is Opus? ● New highly-flexible speech and audio codec – Works for most audio applications ● Completely free – Royalty-free licensing – Open-source implementation ● IETF RFC 6716 (Sep. 2012) Xiph.Org & Mozilla Why a New Audio Codec? http://xkcd.com/927/ http://imgs.xkcd.com/comics/standards.png Xiph.Org & Mozilla Why Should You Care? ● Best-in-class performance within a wide range of bitrates and applications ● Adaptability to varying network conditions ● Will be deployed as part of WebRTC ● No licensing costs ● No incompatible flavours Xiph.Org & Mozilla History ● Jan. 2007: SILK project started at Skype ● Nov. 2007: CELT project started ● Mar. 2009: Skype asks IETF to create a WG ● Feb. 2010: WG created ● Jul. 2010: First prototype of SILK+CELT codec ● Dec 2011: Opus surpasses Vorbis and AAC ● Sep. 2012: Opus becomes RFC 6716 ● Dec. 2013: Version 1.1 of libopus released Xiph.Org & Mozilla Applications and Standards (2010) Application Codec VoIP with PSTN AMR-NB Wideband VoIP/videoconference AMR-WB High-quality videoconference G.719 Low-bitrate music streaming HE-AAC High-quality music streaming AAC-LC Low-delay broadcast AAC-ELD Network music performance Xiph.Org & Mozilla Applications and Standards (2013) Application Codec VoIP with PSTN Opus Wideband VoIP/videoconference Opus High-quality videoconference Opus Low-bitrate music streaming Opus High-quality music streaming Opus Low-delay -
Encoding H.264 Video for Streaming and Progressive Download
W4: KEY ENCODING SKILLS, TECHNOLOGIES TECHNIQUES STREAMING MEDIA EAST - 2019 Jan Ozer www.streaminglearningcenter.com [email protected]/ 276-235-8542 @janozer Agenda • Introduction • Lesson 5: How to build encoding • Lesson 1: Delivering to Computers, ladder with objective quality metrics Mobile, OTT, and Smart TVs • Lesson 6: Current status of CMAF • Lesson 2: Codec review • Lesson 7: Delivering with dynamic • Lesson 3: Delivering HEVC over and static packaging HLS • Lesson 4: Per-title encoding Lesson 1: Delivering to Computers, Mobile, OTT, and Smart TVs • Computers • Mobile • OTT • Smart TVs Choosing an ABR Format for Computers • Can be DASH or HLS • Factors • Off-the-shelf player vendor (JW Player, Bitmovin, THEOPlayer, etc.) • Encoding/transcoding vendor Choosing an ABR Format for iOS • Native support (playback in the browser) • HTTP Live Streaming • Playback via an app • Any, including DASH, Smooth, HDS or RTMP Dynamic Streaming iOS Media Support Native App Codecs H.264 (High, Level 4.2), HEVC Any (Main10, Level 5 high) ABR formats HLS Any DRM FairPlay Any Captions CEA-608/708, WebVTT, IMSC1 Any HDR HDR10, DolbyVision ? http://bit.ly/hls_spec_2017 iOS Encoding Ladders H.264 HEVC http://bit.ly/hls_spec_2017 HEVC Hardware Support - iOS 3 % bit.ly/mobile_HEVC http://bit.ly/glob_med_2019 Android: Codec and ABR Format Support Codecs ABR VP8 (2.3+) • Multiple codecs and ABR H.264 (3+) HLS (3+) technologies • Serious cautions about HLS • DASH now close to 97% • HEVC VP9 (4.4+) DASH 4.4+ Via MSE • Main Profile Level 3 – mobile HEVC (5+) -
Multiple Reference Motion Compensation: a Tutorial Introduction and Survey Contents
Foundations and TrendsR in Signal Processing Vol. 2, No. 4 (2008) 247–364 c 2009 A. Leontaris, P. C. Cosman and A. M. Tourapis DOI: 10.1561/2000000019 Multiple Reference Motion Compensation: A Tutorial Introduction and Survey By Athanasios Leontaris, Pamela C. Cosman and Alexis M. Tourapis Contents 1 Introduction 248 1.1 Motion-Compensated Prediction 249 1.2 Outline 254 2 Background, Mosaic, and Library Coding 256 2.1 Background Updating and Replenishment 257 2.2 Mosaics Generated Through Global Motion Models 261 2.3 Composite Memories 264 3 Multiple Reference Frame Motion Compensation 268 3.1 A Brief Historical Perspective 268 3.2 Advantages of Multiple Reference Frames 270 3.3 Multiple Reference Frame Prediction 271 3.4 Multiple Reference Frames in Standards 277 3.5 Interpolation for Motion Compensated Prediction 281 3.6 Weighted Prediction and Multiple References 284 3.7 Scalable and Multiple-View Coding 286 4 Multihypothesis Motion-Compensated Prediction 290 4.1 Bi-Directional Prediction and Generalized Bi-Prediction 291 4.2 Overlapped Block Motion Compensation 294 4.3 Hypothesis Selection Optimization 296 4.4 Multihypothesis Prediction in the Frequency Domain 298 4.5 Theoretical Insight 298 5 Fast Multiple-Frame Motion Estimation Algorithms 301 5.1 Multiresolution and Hierarchical Search 302 5.2 Fast Search using Mathematical Inequalities 303 5.3 Motion Information Re-Use and Motion Composition 304 5.4 Simplex and Constrained Minimization 306 5.5 Zonal and Center-biased Algorithms 307 5.6 Fractional-pixel Texture Shifts or Aliasing