Neural Image Compression Via Non-Local Attention Optimization and Improved Context Modeling Tong Chen, Haojie Liu, Zhan Ma, Qiu Shen, Xun Cao, and Yao Wang

Total Page:16

File Type:pdf, Size:1020Kb

Neural Image Compression Via Non-Local Attention Optimization and Improved Context Modeling Tong Chen, Haojie Liu, Zhan Ma, Qiu Shen, Xun Cao, and Yao Wang 1 Neural Image Compression via Non-Local Attention Optimization and Improved Context Modeling Tong Chen, Haojie Liu, Zhan Ma, Qiu Shen, Xun Cao, and Yao Wang Abstract—This paper proposes a novel Non-Local Attention Fundamentally, image coding/compression is trying to ex- optmization and Improved Context modeling-based image com- ploit signal redundancy and represent the original pixel sam- pression (NLAIC) algorithm, which is built on top of the deep ples (in RGB or other color space such as YCbCr) using a nerual network (DNN)-based variational auto-encoder (VAE) structure. Our NLAIC 1) embeds non-local network operations compact and high-fidelity format. This is also referred to as the as non-linear transforms in the encoders and decoders for both source coding [5]. Conventional transform coding (e.g., JPEG, the image and the latent representation probability informa- JPEG 2000) or hybrid transform/prediction coding (e.g., intra tion (known as hyperprior) to capture both local and global coding of H.264/AVC and HEVC) is utilized. Here, typical correlations, 2) applies attention mechanism to generate masks transforms are Discrete Cosine Transform (DCT) [6], Wavelet that are used to weigh the features, which implicitly adapt bit allocation for feature elements based on their importance, and 3) Transform [7], and so on. Transforms referred here are usually implements the improved conditional entropy modeling of latent with fixed basis, that are trained in advance presuming the features using joint 3D convolutional neural network (CNN)-based knowledge of the source signal distribution. On the other autoregressive contexts and hyperpriors. Towards the practical hand, intra prediction usually leverages the local [8] and application, additional enhancements are also introduced to speed global correlations [9] to exploit the redundancy. Since intra up processing (e.g., parallel 3D CNN-based context prediction), reduce memory consumption (e.g., sparse non-local processing) prediction can be expressed as the linear superimposition of and alleviate the implementation complexity (e.g., unified model casual samples, it can be treated as an alternative transform for variable rates without re-training). The proposed model as well. Lossy compression is then achieved via applying the outperforms existing methods on Kodak and CLIC datasets with quantization on transform coefficients followed by an adaptive the state-of-the-art compression efficiency reported, including entropy coding. Thus, typical image compression pipeline learned and conventional (e.g., BPG, JPEG2000, JPEG) image compression methods, for both PSNR and MS-SSIM distortion can be simply illustrated by “transform”, “quantization” and metrics. “entropy coding” consecutively. Instead of applying the handcrafted components in ex- Index Terms—Non-local network, attention mechanism, con- ditional probability prediction, variable-rate model, end-to-end isting image compression methods, such as DCT, scalar learning quantization, etc, most recently emerged machine learning based image compression algorithms [10], [11], [12] leverage the autoencoder structure, which transforms raw pixels into I. INTRODUCTION compressible latent features via stacked convolutional neural Light reflected from the object surface, travels across the networks (CNNs) in a nonlinear means [13]. These latent 3D environment and finally reaches at the sensor plane of features are quantized and entropy coded subsequently by the camera or the retina of our Human Visual System (HVS) further exploiting the statistical redundancy. Recent works as a projected 2D image to represent the natural scene. have revealed that compression efficiency can be improved Nowadays, images spread everywhere via social networking when exploring the conditional probabilities via the contexts of (e.g., Facebook, WeChat), professional photography sharing autoregressive spatial-channel neighbors and hyperpriors [12], (e.g., flickr), online advertisement (e.g., Google Ads) and [14], [10], [15] for the compression of features. Typically, rate- arXiv:1910.06244v1 [eess.IV] 11 Oct 2019 so on, mostly in standard compliant formats compressed distortion optimization (RDO) [16] is fulfilled by minimizing using JPEG [1], JPEG2000 [2], H.264/AVC [3] or High Lagrangian cost J = R + λD, when performing the end-to- Efficiency Video Coding (HEVC) [4] intra-picture coding end learning. Here, R is referred to as entropy rate, and D is based image profile (a.k.a., Better Portable Graphics - BPG the distortion measured by either mean squared error (MSE), https://bellard.org/bpg/), etc. A better compression method1 multiscale structural similarity (MS-SSIM) [17], even feature is always desired to preserve the higher image quality but or adversarial loss [18], [19]. with less bits consumption. This would save the file storage However, existing methods still present several limitations. at the Internet scale (e.g., > 350 million images submitted For example, most of the operations, such as stacked convolu- and shared to Facebook per day), and enable the faster and tions, are performed locally with limited receptive field, even more efficient image sharing/exchange with better quality of with pyramidal decomposition. Furthermore, latent features experience (QoE). are mostly treated with equal importance in either spatial or channel dimension, without considering the diverse visual T. Chen and H. Liu contributed equally to this work. sensitivities to various content at different frequency [20]. 1Since the focus of this paper is lossy compression, for simplicity, we use “compression” to represent “lossy compression” in short throughout this work, Thus, attempts have been made in [14], [12] to exploit impor- unless pointed out specifically. tance maps on top of the quantized latent feature vectors for 2 Y Conv Conv Conv Conv Conv Conv Conv ​ ​ ​ ​ ​ m h ResBlock ResBlock ResBlock ResBlock ResBlock ResBlock ResBlock ResBlock ResBlock ResBlock ResBlock 5 5 5 5 5 × × × × × 5 5 5 5 5 × × × × × 192 192 192 192 192 192 192 NLAM NLAM NLAM ​ ​ ​ ​ ​ s s s s s Q Q 2 2 2 2 2 ˆ Xˆ Z AE​ Context Model AE​ 2 96 48 x 24 24 x x 1 × 1 1 5 ​ × 5 × 5 ˆ k Conv Mask Conv Mask Conv k Conv Y k Conv Reconstruction 2 2 2 2 2 s s s s s 2 AD​ AD​ s 3 × 192 192 192 192 192 192 384 5 × × × × × × 5 5 5 5 5 5 × × × × × 5 5 5 5 5 NLAM NLAM NLAM ​ ​ ​ ResBlock ResBlock ResBlock ResBlock ResBlock ResBlock ResBlock ResBlock ResBlock ResBlock ResBlock ResBlock Conv Conv ​ Conv Conv Conv Conv Conv Conv Conv ​ ​ ​ ​ ​ m h (a) Conv Conv ReLU X HWC :1 1 :1 1 g :1 1 1 x HWC2 HWC2 HWC2 1 Attention mask HW C 2 C2 HW HW C 2 NLN Sigmoid ResBlock ResBlock ResBlock Conv Conv Input feature Input Output feature Output HW HW softmax HW C 2 Y HWC2 z :1 1 ResBlock ResBlock ResBlock HWC Corresponding input Z (b) (c) Fig. 1: Non-Local Attention optimization and Improved Context modeling-based image compression - NLAIC. (a) NLAIC: a variational autoencoder with embedded non-local attention optimization in the main and hyperprior encoders and decoders (e.g., Em, Eh, Dm, and Dh). ”Conv 5×5×192 s2” indicates a convolution layer using a kernel of size 5×5, 192 output channels, and stride of 2 (in decoder Dm and Dh, ”Conv” indicates transposed convolution). NLAM represents the Non-Local Attention Modules. “Q” is for quantization, “AE” and “AD” are arithmetic encoding and decoding, P here denotes the probability model serving for arithmetic coding, k1 in context model means 3d conv kernel of size 1×1×1; (b) NLAM: The main branch consists of three ResBlocks. The mask branch combines non-local modules with ResBlocks for attention mask generation. The details of ResBlock is shown in the dash frame. (c) Non-local network (NLN): H × W × C denotes the size of feature maps with height H, width W and channel C. ⊕ is the add operation and ⊗ is the matrix multiplication. adaptive bit allocation, but still only at bottleneck layer. These also improve the context modeling of the entropy engine for methods usually signal the importance maps explicitly. If the better latent feature compression, by using a masked 3D CNN importance map is not embedded explicitly, coding efficiency (i.e., 5×5×5) based prediction to approximate more accurate will be slightly affected because of the probability estimation conditional statistics. error reported in [12]. Even with the coding efficiency outperforming most existing In this paper, our NLAIC introduces non-local processing traditional image compression standards, recent learning-based blocks into the variational autoencoder (VAE) structure to methods [21], [10], [12] are still far from the massive adoption capture both local and global correlations among pixels. Atten- in reality. For practical application, compression algorithms tion mechanism is also embedded to generate more compact need to be carefully evaluated and justified by its coding per- representation for both latent features and hyperpriors. Simple formance, computational and space complexity (e.g., memory rectified linear unit (ReLU) is applied for nonlinear activation. consumption), hardware implementation friendliness, etc. Few Different from those existing methods in [14], [12], our non- researches [11], [22] were developed in this line for practical local attention masks are applied at different layers (not only learned image compression coder. In this paper, we propose for quantized features at the bottleneck), to mask and adapt additional enhancements to simply the proposed NLAIC, intelligently through the end-to-end learning framework. We including 1) a unified network model for variable bitrates 3 using quality scaling factors; 2) sparse non-local processing II. RELATED WORK for memory reduction; and 3) parallel 3D masked CNN based context modeling for computational throughput improvement. In this section, we review prior works related to the non- All of these attempts have greatly reduce the space and time local operations in image/video processing, attention mecha- complexity of proposed NLAIC, with negligible sacrifice of nism, as well as the learned image compression algorithms. the coding efficiency.
Recommended publications
  • An Effective Self-Adaptive Policy for Optimal Video Quality Over Heterogeneous Mobile Devices and Network Discovery Services
    Appl. Math. Inf. Sci. 13, No. 3, 489-505 (2019) 489 Applied Mathematics & Information Sciences An International Journal http://dx.doi.org/10.18576/amis/130322 An Effective Self-Adaptive Policy for Optimal Video Quality over Heterogeneous Mobile Devices and Network Discovery Services Saleh Ali Alomari∗1 , Mowafaq Salem Alzboon1, Belal Zaqaibeh1 and Mohammad Subhi Al-Batah1 1Faculty of Science and Information Technology, Jadara University, 21110 Irbid, Jordan Received: 12 Dec. 2018, Revised: 1 Feb. 2019, Accepted: 20 Feb. 2019 Published online: 1 May 2019 Abstract: The Video on Demand (VOD) system is considered a communicating multimedia system that can allow clients be interested whilst watching a video of their selection anywhere and anytime upon their convenient. The design of the VOD system is based on the process and location of its three basic contents, which are: the server, network configuration and clients. The clients are varied from numerous approaches, battery capacities, involving screen resolutions, capabilities and decoder features (frame rates, spatial dimensions and coding standards). The up-to-date systems deliver VOD services through to several devices by utilising the content of a single coded video without taking into account various features and platforms of a device, such as WMV9, 3GPP2 codec, H.264, FLV, MPEG-1 and XVID. This limitation only provides existing services to particular devices that are only able to play with a few certain videos. Multiple video codecs are stored by VOD systems store for a similar video into the storage server. The problems caused by the bandwidth overhead arise once the layers of a video are produced.
    [Show full text]
  • Energy-Efficient Design of the Secure Better Portable
    Energy-Efficient Design of the Secure Better Portable Graphics Compression Architecture for Trusted Image Communication in the IoT Umar Albalawi Saraju P. Mohanty Elias Kougianos Computer Science and Engineering Computer Science and Engineering Engineering Technology University of North Texas, USA. University of North Texas, USA. University of North Texas. USA. Email: [email protected] Email: [email protected] Email: [email protected] Abstract—Energy consumption has become a major concern it. On the other hand, researchers from the software field in portable applications. This paper proposes an energy-efficient investigate how the software itself and its different uses can design of the Secure Better Portable Graphics Compression influence energy consumption. An efficient software is capable (SBPG) Architecture. The architecture proposed in this paper is suitable for imaging in the Internet of Things (IoT) as the main of adapting to the requirements of everyday usage while saving concentration is on the energy efficiency. The novel contributions as much energy as possible. Software engineers contribute to of this paper are divided into two parts. One is the energy efficient improving energy consumption by designing frameworks and SBPG architecture, which offers encryption and watermarking, tools used in the process of energy metering and profiling [2]. a double layer protection to address most of the issues related to As a specific type of data, images can have a long life if privacy, security and digital rights management. The other novel contribution is the Secure Digital Camera integrated with the stored properly. However, images require a large storage space. SBPG architecture. The combination of these two gives the best The process of storing an image starts with its compression.
    [Show full text]
  • Download Media Player Codec Pack Version 4.1 Media Player Codec Pack
    download media player codec pack version 4.1 Media Player Codec Pack. Description: In Microsoft Windows 10 it is not possible to set all file associations using an installer. Microsoft chose to block changes of file associations with the introduction of their Zune players. Third party codecs are also blocked in some instances, preventing some files from playing in the Zune players. A simple workaround for this problem is to switch playback of video and music files to Windows Media Player manually. In start menu click on the "Settings". In the "Windows Settings" window click on "System". On the "System" pane click on "Default apps". On the "Choose default applications" pane click on "Films & TV" under "Video Player". On the "Choose an application" pop up menu click on "Windows Media Player" to set Windows Media Player as the default player for video files. Footnote: The same method can be used to apply file associations for music, by simply clicking on "Groove Music" under "Media Player" instead of changing Video Player in step 4. Media Player Codec Pack Plus. Codec's Explained: A codec is a piece of software on either a device or computer capable of encoding and/or decoding video and/or audio data from files, streams and broadcasts. The word Codec is a portmanteau of ' co mpressor- dec ompressor' Compression types that you will be able to play include: x264 | x265 | h.265 | HEVC | 10bit x265 | 10bit x264 | AVCHD | AVC DivX | XviD | MP4 | MPEG4 | MPEG2 and many more. File types you will be able to play include: .bdmv | .evo | .hevc | .mkv | .avi | .flv | .webm | .mp4 | .m4v | .m4a | .ts | .ogm .ac3 | .dts | .alac | .flac | .ape | .aac | .ogg | .ofr | .mpc | .3gp and many more.
    [Show full text]
  • Encoding H.264 Video for Streaming and Progressive Download
    What is Tuning? • Disable features that: • Improve subjective video quality but • Degrade objective scores • Example: adaptive quantization – changes bit allocation over frame depending upon complexity • Improves visual quality • Looks like “error” to metrics like PSNR/VMAF What is Tuning? • Switches in encoding string that enables tuning (and disables these features) ffmpeg –input.mp4 –c:v libx264 –tune psnr output.mp4 • With x264, this disables adaptive quantization and psychovisual optimizations Why So Important • Major point of contention: • “If you’re running a test with x264 or x265, and you wish to publish PSNR or SSIM scores, you MUST use –tune PSNR or –tune SSIM, or your results will be completely invalid.” • http://x265.org/compare-video-encoders/ • Absolutely critical when comparing codecs because some may or may not enable these adjustments • You don’t have to tune in your tests; but you should address the issue and explain why you either did or didn’t Does Impact Scores • 3 mbps football (high motion, lots of detail) • PSNR • No tuning – 32.00 dB • Tuning – 32.58 dB • .58 dB • VMAF • No tuning – 71.79 • Tuning – 75.01 • Difference – over 3 VMAF points • 6 is JND, so not a huge deal • But if inconsistent between test parameters, could incorrectly show one codec (or encoding configuration) as better than the other VQMT VMAF Graph Red – tuned Green – not tuned Multiple frames with 3-4-point differentials Downward spikes represent untuned frames that metric perceives as having lower quality Tuned Not tuned Observations • Tuning
    [Show full text]
  • Screen Capture Tools to Record Online Tutorials This Document Is Made to Explain How to Use Ffmpeg and Quicktime to Record Mini Tutorials on Your Own Computer
    Screen capture tools to record online tutorials This document is made to explain how to use ffmpeg and QuickTime to record mini tutorials on your own computer. FFmpeg is a cross-platform tool available for Windows, Linux and Mac. Installation and use process depends on your operating system. This info is taken from (Bellard 2016). Quicktime Player is natively installed on most of Mac computers. This tutorial focuses on Linux and Mac. Table of content 1. Introduction.......................................................................................................................................1 2. Linux.................................................................................................................................................1 2.1. FFmpeg......................................................................................................................................1 2.1.1. installation for Linux..........................................................................................................1 2.1.1.1. Add necessary components........................................................................................1 2.1.2. Screen recording with FFmpeg..........................................................................................2 2.1.2.1. List devices to know which one to record..................................................................2 2.1.2.2. Record screen and audio from your computer...........................................................3 2.2. Kazam........................................................................................................................................4
    [Show full text]
  • Improved Lossy Image Compression with Priming and Spatially Adaptive Bit Rates for Recurrent Networks
    Improved Lossy Image Compression with Priming and Spatially Adaptive Bit Rates for Recurrent Networks Nick Johnston, Damien Vincent, David Minnen, Michele Covell, Saurabh Singh, Troy Chinen, Sung Jin Hwang, Joel Shor, George Toderici {nickj, damienv, dminnen, covell, saurabhsingh, tchinen, sjhwang, joelshor, gtoderici} @google.com, Google Research Abstract formation from the current residual and combines it with context stored in the hidden state of the recurrent layers. By We propose a method for lossy image compression based saving the bits from the quantized bottleneck after each it- on recurrent, convolutional neural networks that outper- eration, the model generates a progressive encoding of the forms BPG (4:2:0), WebP, JPEG2000, and JPEG as mea- input image. sured by MS-SSIM. We introduce three improvements over Our method provides a significant increase in compres- previous research that lead to this state-of-the-art result us- sion performance over previous models due to three im- ing a single model. First, we modify the recurrent architec- provements. First, by “priming” the network, that is, run- ture to improve spatial diffusion, which allows the network ning several iterations before generating the binary codes to more effectively capture and propagate image informa- (in the encoder) or a reconstructed image (in the decoder), tion through the network’s hidden state. Second, in addition we expand the spatial context, which allows the network to lossless entropy coding, we use a spatially adaptive bit to represent more complex representations in early itera- allocation algorithm to more efficiently use the limited num- tions. Second, we add support for spatially adaptive bit rates ber of bits to encode visually complex image regions.
    [Show full text]
  • Spin Digital HEVC/H.265 Software Encoder (Spin Enc) Enables Ultra-High-Quality Video with the Highest Compression Level
    Spin Digital HEVC/H.265 software encoder (Spin Enc) enables ultra-high-quality video with the highest compression level. Encoding, transmission, and storage of video in 4K, 8K, and beyond are now possible with commodity computing technologies. HEVC/H.265 encoder for production, contribution, and distribution of professional video for broadcast, VoD, and creative studios. Spin Digital HEVC/H.265 encoder is ready for the next generation of high-quality video systems, providing support for Ultra HD (UHD), High Dynamic Range (HDR), High Frame Rate (HFR), Wide Color Gamut (WCG), and Virtual Reality (360° video). Product Highlights HEVC/H.265 Encoder Package • Fast offline encoding software solution • Encoder: standalone HEVC/H.265 encoder • Ready for 8K and beyond • Transcoder: HEVC/H.265 decoder and encoder • Significantly better compression and quality integrated in FFmpeg than competing encoders • SDK: encoder plugin for FFmpeg (Libavcodec) • Enables WCG and HDR with up to 12-bit video • Compatible with ARIB STD-B32 standard • Versatile high-precision pre-processing filters • Preserves color resolution with 4:2:2, 4:4:4, and RGB formats • 22.2-ch AAC audio encoding and decoding SPIN DIGITAL HEVC/H.265 ENCODER Support for the HEVC standard: Main and Main 10 profiles Range Extensions (HEVCv2) profiles ARIB STD-B32 version 3.9 Resolutions: 4K, 8K, and beyond Color formats: 4:2:0, 4:2:2, 4:4:4, RGB Bit depths: 8-, 10-, 12-bit Color spaces: BT.601, BT.709, DCI-P3, BT.2020 HDR support: ST2084 transfer function, ST2086 HDR metadata, HLG Coding
    [Show full text]
  • Home for Christmas S01E01 INTERNAL 720P WEB X264strife Eztv Thepiratebay
    1 / 2 Home For Christmas S01E01 INTERNAL 720p WEB X264-STRiFE [eztv] - ThePirateBay http://nextisp.com/index.php/Home/Drama/detail/p/a-little-love-never-hurts/uid/0.html - A ... [url=http://thebeststar.us/torrent/1663028501/UFC+210+Prelims+720p+WEB-DL+ ... in Fire S04E09 The Charay iNTERNAL 720p HDTV x264-DHD[ettv][/url] ... [url=http://almuzaffar.org/christmas-is-here-again-movie]Christmas Is Here .... Us.S01.COMPLETE.720p.WEB.x264-GalaxyTV2019-05-31 VIP 1.22 GiB 285 sotnikam · Video > HD MoviesThey.Shall.Not.Grow.Old.2018.1080p.BluRay.H264.. Download Chloe Neill - Chicagoland Vampires and Dark Elite torrent or any other torrent from the Other E-books. Direct download via magnet link.Missing: Home ‎Christmas ‎S01E01 ‎INTERNAL ‎WEB ‎x264- ‎[eztv] -. ... i ian dervish · file 3245112 trailer.park.boys.out.of.the.park.s02e01.720p.web.x264 strife ... file 3086561 unearthed.2016.s02e05.internal.720p.hevc.x265 megusta ... christmas cookie challenge s03e05 colors of christmas 480p x264 msd eztv ... 720p bluray yts yify · file 817715 miami.monkey.s01e01.480p.hdtv.x264 msd .... Nov 15, 2014 — Ninja x men 1 720p dual audio Любовь РїРѕРґ ... Home And Away S23E93 WS PDTV XviD-AUTV kapitein rob en het geheim van ... й”法提琴手 Нанолюбовь (10) web dl inside amy schumer ... Low Winter Sun S01E01 2013 HDTV x264 EZN ettv Mad Men (Season 06 Episode .... Home for Christmas S01E01 INTERNAL WEB x264-STRiFE EZTV torrent download - download for free Home for Christmas S01E01 INTERNAL WEB .... ... file 4554555 brave.new.world.s01e07.internal.1080p.hevc.x265 megusta eztv ..
    [Show full text]
  • Comparison of JPEG's Competitors for Document Images
    Comparison of JPEG’s competitors for document images Mostafa Darwiche1, The-Anh Pham1 and Mathieu Delalandre1 1 Laboratoire d’Informatique, 64 Avenue Jean Portalis, 37200 Tours, France e-mail: fi[email protected] Abstract— In this work, we carry out a study on the per- in [6] concerns assessing quality of common image formats formance of potential JPEG’s competitors when applied to (e.g., JPEG, TIFF, and PNG) that relies on optical charac- document images. Many novel codecs, such as BPG, Mozjpeg, ter recognition (OCR) errors and peak signal to noise ratio WebP and JPEG-XR, have been recently introduced in order to substitute the standard JPEG. Nonetheless, there is a lack of (PSNR) metric. The authors in [3] compare the performance performance evaluation of these codecs, especially for a particular of different coding methods (JPEG, JPEG 2000, MRC) using category of document images. Therefore, this work makes an traditional PSNR metric applied to several document samples. attempt to provide a detailed and thorough analysis of the Since our target is to provide a study on document images, aforementioned JPEG’s competitors. To this aim, we first provide we then use a large dataset with different image resolutions, a review of the most famous codecs that have been considered as being JPEG replacements. Next, some experiments are performed compress them at very low bit-rate and after that evaluate the to study the behavior of these coding schemes. Finally, we extract output images using OCR accuracy. We also take into account main remarks and conclusions characterizing the performance the PSNR measure to serve as an additional quality metric.
    [Show full text]
  • A Complete End-To-End Open Source Toolchain for the Versatile Video Coding (VVC) Standard
    A Complete End-To-End Open Source Toolchain for the Versatile Video Coding (VVC) Standard Adam Wieckowski*, Christian Lehmann*, Benjamin Bross*, Detlev Marpe*, Thibaud Biatek+, Mickael Raulet+, Jean Le Feuvre$ *Video Communication and Applications Department, Fraunhofer HHI, Berlin, Germany +ATEME, Vélizy-Villacoublay, France $LTCI, Telecom Paris, Institut Polytechnique de Paris, France {firstname.lastname}@hhi.fraunhofer.de, {t.biatek,m.raulet}@ateme.com, [email protected] ABSTRACT Standard. In Proceedings of the 29th ACM International Conference on Multimedia (MM’21). ACM, Chengdu, China, 4 pages. Versatile Video Coding (VVC) is the most recent international https://doi.org/10.1145/1234567890 video coding standard jointly developed by ITU-T and ISO/IEC, which has been finalized in July 2020. VVC allows for significant bit-rate reductions around 50% for the same subjective video 1 Introduction quality compared to its predecessor, High Efficiency Video In July 2021, the Versatile Video Coding (VVC) standard has Coding (HEVC). One year after finalization, VVC support in been finalized by the Joint Video Experts Team (JVET) of ITU-T devices and chipsets is still under development, which is aligned VCEG and ISO/IEC MPEG [1][2]. VVC was designed with two with the typical development cycles of new video coding main objectives: significant bit-rate reduction over its predecessor standards. This paper presents open-source software packages that High Efficiency Video Coding (HEVC) for the same perceived allow building a complete VVC end-to-end toolchain already one video quality and versatility to facilitate coding and transport for a year after its finalization. This includes the Fraunhofer HHI wide range of applications and content types.
    [Show full text]
  • Forcepoint DLP Supported File Formats and Size Limits
    Forcepoint DLP Supported File Formats and Size Limits Supported File Formats and Size Limits | Forcepoint DLP | v8.8.1 This article provides a list of the file formats that can be analyzed by Forcepoint DLP, file formats from which content and meta data can be extracted, and the file size limits for network, endpoint, and discovery functions. See: ● Supported File Formats ● File Size Limits © 2021 Forcepoint LLC Supported File Formats Supported File Formats and Size Limits | Forcepoint DLP | v8.8.1 The following tables lists the file formats supported by Forcepoint DLP. File formats are in alphabetical order by format group. ● Archive For mats, page 3 ● Backup Formats, page 7 ● Business Intelligence (BI) and Analysis Formats, page 8 ● Computer-Aided Design Formats, page 9 ● Cryptography Formats, page 12 ● Database Formats, page 14 ● Desktop publishing formats, page 16 ● eBook/Audio book formats, page 17 ● Executable formats, page 18 ● Font formats, page 20 ● Graphics formats - general, page 21 ● Graphics formats - vector graphics, page 26 ● Library formats, page 29 ● Log formats, page 30 ● Mail formats, page 31 ● Multimedia formats, page 32 ● Object formats, page 37 ● Presentation formats, page 38 ● Project management formats, page 40 ● Spreadsheet formats, page 41 ● Text and markup formats, page 43 ● Word processing formats, page 45 ● Miscellaneous formats, page 53 Supported file formats are added and updated frequently. Key to support tables Symbol Description Y The format is supported N The format is not supported P Partial metadata
    [Show full text]
  • Guideline: AI-Based Super Resolution Upscaling
    Guideline Document name Content production guidelines: AI-Based Super Resolution Upscaling Project Acronym: IMMERSIFY Grant Agreement number: 762079 Project Title: Audiovisual Technologies for Next Generation Immersive Media Revision: 0.5 Authors: Ali Nikrang, Johannes Pöll Support and review Diego Felix de Souza, Mauricio Alvarez Mesa Delivery date: March 31 2020 Dissemination level (Public / Confidential) Public Abstract This guideline aims to give an overview of the state of the art software-implementations for video and image upscaling. The guideline focuses on the methods that use AI in the background and are available as standalone applications or plug-ins for frequently used image and video processing software. The target group of the guideline is artists without any previous experiences in the field of artificial intelligence. This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement 762079. TABLE OF CONTENTS ​ ​ ​ ​ ​ Introduction 3 Using Machine Learning for Upscaling 4 General Process 4 Install FFmpeg 5 Extract the Frames and the Audio Stream 5 Reassemble the Frames and Audio Stream After Upscaling 5 Upscaling with Waifu2x 6 Further Information 8 Upscaling with Adobe CC 10 Basic Image Upscale using Adobe Photoshop CC 10 Basic Video Upscales using Adobe CC After Effects 15 Upscaling with Topaz Video Enhance AI 23 Topaz Video Enhance AI workflow 23 Conclusion 28 1 Introduction Upscaling of digital pictures (such as video frames) refers to increasing the resolution of images using algorithmic methods. This can be very useful when for example, artists work with low-resolution materials (such as old video footages) that need to be used in a high-resolution output.
    [Show full text]