Evaluating Image Compression Methods on Two Dimensional Height Representations

Total Page:16

File Type:pdf, Size:1020Kb

Evaluating Image Compression Methods on Two Dimensional Height Representations Linköping University | Department of Electrical Engineering Master’s thesis, 30 ECTS | Information Coding 2020 | LIU-ISY/ISY-A-EX-A--20/5344--SE Evaluating Image Compression Methods on Two Dimensional Height Representations Oscar Sjöberg Supervisor : Harald Nautsch Examiner : Ingemar Ragnemalm External supervisor : Filip Thorardsson Linköpings universitet SE–581 83 Linköping +46 13 28 10 00 , www.liu.se Upphovsrätt Detta dokument hålls tillgängligt på Internet - eller dess framtida ersättare - under 25 år från publicer- ingsdatum under förutsättning att inga extraordinära omständigheter uppstår. Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner, skriva ut enstaka ko- pior för enskilt bruk och att använda det oförändrat för ickekommersiell forskning och för undervis- ning. Överföring av upphovsrätten vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av dokumentet kräver upphovsmannens medgivande. För att garantera äktheten, säker- heten och tillgängligheten finns lösningar av teknisk och administrativ art. Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i den omfattning som god sed kräver vid användning av dokumentet på ovan beskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan form eller i sådant sammanhang som är kränkande för upphovsman- nens litterära eller konstnärliga anseende eller egenart. För ytterligare information om Linköping University Electronic Press se förlagets hemsida http://www.ep.liu.se/. Copyright The publishers will keep this document online on the Internet - or its possible replacement - for a period of 25 years starting from the date of publication barring exceptional circumstances. The online availability of the document implies permanent permission for anyone to read, to down- load, or to print out single copies for his/hers own use and to use it unchanged for non-commercial research and educational purpose. Subsequent transfers of copyright cannot revoke this permission. All other uses of the document are conditional upon the consent of the copyright owner. The publisher has taken technical and administrative measures to assure authenticity, security and accessibility. According to intellectual property law the author has the right to be mentioned when his/her work is accessed as described above and to be protected against infringement. For additional information about the Linköping University Electronic Press and its procedures for publication and for assurance of document integrity, please refer to its www home page: http://www.ep.liu.se/. © Oscar Sjöberg Abstract The need for compression and sparse representation varies within the fields of com- puter science. However, a field that has always benefited, and been an integral part of the compression vocabulary is the digital image. There are however many things that can be done with data structured in a two dimensional raster. In this report standard lossy image compression techniques are evaluated on a DTM (Digital Terrain Model) to assess their applicability in the height domain. Tests were performed using different open source soft- ware to increase repetability. The best performing codec, JPEG2000, was also introduced into a 3D graphic context, enabling subjective evaluation of the gain of smaller disk space to the higher resolution. The JPEG2000 also permitted the data to be loaded in a more efficient manner, thanks to the progressive capabilities of the codec. Acknowledgments This master thesis hadn’t come to exist without all the inventiveness and positive energy rubbed of from the different people I’ve encountered during my studies here at Linköping, both at the university and at the Vricon office, so I owe my thanks to many! However, in the pursuit of brevity, avoiding a harangue, I want to direct acknowledgement to the people di- rectly responsible for this report. Thus, big thanks to both my examiner, Ingemar Ragnemalm, and supervisor, Harald Nautsch, for your guidance provided during the project. The mas- termind behind the thesis, Filip Thorardson, has also been eager to respond to questions and thereby deserves many thanks. Last but not least I want to thank my friend Simon Kantedal, not only for giving structural feedback and reading this report but also for being a source of inspiration and a motivational force all through my studies at Linköping University. iv Contents Abstract iii Acknowledgments iv Contents v List of Figures vi List of Tables vii 1 Introduction 1 1.1 Motivation . 1 1.2 Aim............................................ 1 1.3 Research questions . 2 1.4 Delimitation . 2 2 Theory 3 2.1 The application . 3 2.2 The two dimensional height raster . 4 2.3 Compression of Images . 5 2.4 JPEG . 8 2.5 WebP . 11 2.6 JPEG2000 . 14 2.7 Metrics for comparison . 19 3 Method 20 3.1 Compressors . 20 3.2 Quality parameter . 20 3.3 Sub-division in frequency domain . 21 3.4 Height Raster Pre and Post Processing . 21 4 Results 23 4.1 Height data compression . 23 4.2 In the geometric context . 28 5 Discussion 30 5.1 Literature Criticism . 30 5.2 The Evaluating Metrics . 30 5.3 Others Results . 30 5.4 Other Compression Schemes . 31 5.5 The work in a wider context . 31 6 Conclusion 32 Bibliography 34 v List of Figures 2.1 Vricon Explorer . 4 2.2 Conceptualisation of a DWT subband cascaded filter response . 7 2.3 JPEG compression scheme . 8 2.4 DCT Basis and the zigzag deconstruction pattern . 9 2.5 Example of quantisation tables used in JPEG quantisation step . 9 2.6 JPEG box artefacts . 12 2.7 WebP compression scheme . 12 2.8 RIFF internal lay out . 13 2.9 JPEG2000 compression scheme . 14 2.10 Dyadic tree structure . 15 2.11 The EBCOT paradigm . 16 2.12 Partitioning of subbands, precincts and code-blocks in JPEG2000 . 17 2.13 Quad-tree structure of the stripes and their bit-plane, in a code-block. 17 2.14 The JPEG2000 marked up codestream . 18 4.1 PSNR given bits/pixel, where jp2 corresponds to JPEG2000, jpg to JPEG and webp toWebP............................................ 24 4.2 SSIM given bits/pixel, where jp2 corresponds to JPEG2000, jpg to JPEG and webp toWebP............................................ 25 4.3 PSNR given bits/pixel – high compression rate – where jp2 corresponds to JPEG2000, jpg to JPEG and webp to WebP . 25 4.4 SSIM given bits/pixel – high compression rate – where jp2 corresponds to JPEG2000, jpg to JPEG and webp to WebP . 26 4.5 Evaluating PSNR and SSIM for different block sizes in the JPEG codec . 26 4.6 PSNR and SSIM for different quality layers in the JPEG2000 codec . 27 4.7 JPEG2000 coded data with falling SNR 80, 60, 35 and 15 dB as input . 27 4.8 Comparing PSNR and SSIM between different resolution layers, 3-9, compressed with ratio 0-75. 28 4.9 Display texturised globe in Vricon Explorer: old map to left and new map, JPEG2000, to the right . 29 4.10 Display globe in Vricon Explorer through height shader: old map to left and new map, JPEG2000, to the right . 29 vi List of Tables 2.1 Tabular representation of JFIF format. 11 vii 1 Introduction Compression is a founding block of modern data science on which many, if not all, of the every day media applications are heavily dependent. Video and music streaming, an integral part of modern web services, could not be delivered with the necessary latency and quality demands of the everyday user if compression schemes should be left out of the equation. With these compression methods, data and information can flow more freely and be more accessible to users, independent of the users system. This is also true for the data used in non web based software. By using data more efficiently and take up less disk applications can be made more available. It also gives creators more freedom to freely design and scale their software more aligned with their wishes. This thesis examines the possibilities to use the standard image compression methods on data structured in the same manner, the two di- mensional raster. However, these rasters depicts something completely different than colour, namely heights. 1.1 Motivation So why would one need height data expressed in a two dimensional array? There are several reasons. Representing data in two dimensional arrays, such as raster, comes with a many different useful perks, such as matrix computation for transforms. Nevertheless, saving high quality rasters, without destroying the data, is problematic and uses a lot of disk compared to other data formats. So if this non-colour data, meaning height data, could be compressed in a similar manner as their chromatic counterpart this would greatly benefit the quest for the sparse representa- tion of data. The motivating factor for this refactoring of height data is to be able to introduce the compressed raster in the graphical tool Vricon Explorer, which is described further in 2.1. 1.2 Aim Two dimensional data can be used in a set of different applications, not only for representing colours in image and video. The heights in this application are used for a geometric purpose, pushing vertexes in a three dimensional object, creating geometric variations on a surface. 1 1.3. Research questions The object in question is the earth, the globe, and the heights are the actual heights, meters above (and under) sea level. The aim is thus to represent the vertex positions of this three dimensional model as accurately and aesthetically pleasing as possible, while using up the smallest storage space possible. 1.3 Research questions Image compression schemes have been investigated thoroughly over the decades, turning them inside out and back again, to prove and disprove their superiority when compressing different types of image data. Such as real image data [12], [14] , medical applications [18][21] and many more.
Recommended publications
  • Panstamps Documentation Release V0.5.3
    panstamps Documentation Release v0.5.3 Dave Young 2020 Getting Started 1 Installation 3 1.1 Troubleshooting on Mac OSX......................................3 1.2 Development...............................................3 1.2.1 Sublime Snippets........................................4 1.3 Issues...................................................4 2 Command-Line Usage 5 3 Documentation 7 4 Command-Line Tutorial 9 4.1 Command-Line..............................................9 4.1.1 JPEGS.............................................. 12 4.1.2 Temporal Constraints (Useful for Moving Objects)...................... 17 4.2 Importing to Your Own Python Script.................................. 18 5 Installation 19 5.1 Troubleshooting on Mac OSX...................................... 19 5.2 Development............................................... 19 5.2.1 Sublime Snippets........................................ 20 5.3 Issues................................................... 20 6 Command-Line Usage 21 7 Documentation 23 8 Command-Line Tutorial 25 8.1 Command-Line.............................................. 25 8.1.1 JPEGS.............................................. 28 8.1.2 Temporal Constraints (Useful for Moving Objects)...................... 33 8.2 Importing to Your Own Python Script.................................. 34 8.2.1 Subpackages.......................................... 35 8.2.1.1 panstamps.commonutils (subpackage)........................ 35 8.2.1.2 panstamps.image (subpackage)............................ 35 8.2.2 Classes............................................
    [Show full text]
  • Discrete Cosine Transform for 8X8 Blocks with CUDA
    Discrete Cosine Transform for 8x8 Blocks with CUDA Anton Obukhov [email protected] Alexander Kharlamov [email protected] October 2008 Document Change History Version Date Responsible Reason for Change 0.8 24.03.2008 Alexander Kharlamov Initial release 0.9 25.03.2008 Anton Obukhov Added algorithm-specific parts, fixed some issues 1.0 17.10.2008 Anton Obukhov Revised document structure October 2008 2 Abstract In this whitepaper the Discrete Cosine Transform (DCT) is discussed. The two-dimensional variation of the transform that operates on 8x8 blocks (DCT8x8) is widely used in image and video coding because it exhibits high signal decorrelation rates and can be easily implemented on the majority of contemporary computing architectures. The key feature of the DCT8x8 is that any pair of 8x8 blocks can be processed independently. This makes possible fully parallel implementation of DCT8x8 by definition. Most of CPU-based implementations of DCT8x8 are firmly adjusted for operating using fixed point arithmetic but still appear to be rather costly as soon as blocks are processed in the sequential order by the single ALU. Performing DCT8x8 computation on GPU using NVIDIA CUDA technology gives significant performance boost even compared to a modern CPU. The proposed approach is accompanied with the sample code “DCT8x8” in the NVIDIA CUDA SDK. October 2008 3 1. Introduction The Discrete Cosine Transform (DCT) is a Fourier-like transform, which was first proposed by Ahmed et al . (1974). While the Fourier Transform represents a signal as the mixture of sines and cosines, the Cosine Transform performs only the cosine-series expansion.
    [Show full text]
  • National Critical Information Infrastructure Protection Centre Common Vulnerabilities and Exposures(CVE) Report
    National Critical Information Infrastructure Protection Centre Common Vulnerabilities and Exposures(CVE) Report https://nciipc.gov.in 01 - 15 Mar 2021 Vol. 08 No. 05 Weakness Publish Date CVSS Description & CVE ID Patch NCIIPC ID Application Accellion fta Improper Accellion FTA 9_12_432 Neutralization and earlier is affected by of Special argument injection via a Elements in crafted POST request to an A-ACC-FTA- 02-Mar-21 7.5 N/A Output Used by admin endpoint. The fixed 160321/1 a Downstream version is FTA_9_12_444 Component and later. ('Injection') CVE ID : CVE-2021-27730 Improper Accellion FTA 9_12_432 Neutralization and earlier is affected by of Input During stored XSS via a crafted A-ACC-FTA- Web Page 02-Mar-21 4.3 POST request to a user N/A 160321/2 Generation endpoint. The fixed version ('Cross-site is FTA_9_12_444 and later. Scripting') CVE ID : CVE-2021-27731 adguard adguard_home An issue was discovered in AdGuard before 0.105.2. An Improper attacker able to get the https://githu Restriction of user's cookie is able to b.com/Adgua A-ADG- Excessive 03-Mar-21 5 bruteforce their password rdTeam/AdG ADGU- Authentication offline, because the hash of uardHome/is 160321/3 Attempts the password is stored in sues/2470 the cookie. CVE ID : CVE-2021-27935 Afterlogic webmail_pro Improper 04-Mar-21 6.8 An issue was discovered in https://auror A-AFT- CVSS Scoring Scale 0-1 1-2 2-3 3-4 4-5 5-6 6-7 7-8 8-9 9-10 Page 1 of 166 Weakness Publish Date CVSS Description & CVE ID Patch NCIIPC ID Limitation of a AfterLogic Aurora through amail.wordpr WEBM- Pathname to a 8.5.3 and WebMail Pro ess.com/202 160321/4 Restricted through 8.5.3, when DAV is 1/02/03/add Directory enabled.
    [Show full text]
  • RISC-V Software Ecosystem
    RISC-V Software Ecosystem Palmer Dabbelt [email protected] UC Berkeley February 8, 2015 Software on RISC-V So it turns out there is a lot of software... 2 Software on RISC-V sys-libs/zlib-1.2.8-r1 media-libs/libjpeg-turbo-1.3.1-r1 virtual/shadow-0 virtual/libintl-0-r1 sys-apps/coreutils-8.23 sys-apps/less-471 sys-libs/ncurses-5.9-r3 sys-libs/readline-6.3 p8-r2 app-admin/eselect-python-20140125 sys-apps/gentoo-functions-0.8 sys-libs/glibc-2.20-r1 sys-apps/grep-2.21-r1 dev-libs/gmp-6.0.0a sys-apps/util-linux-2.25.2-r2 virtual/service-manager-0 sys-libs/db-6.0.30-r1 sys-apps/sed-4.2.2 virtual/editor-0 virtual/libiconv-0-r1 sys-apps/file-5.22 sys-devel/gcc-4.9.2-r1 app-arch/bzip2-1.0.6-r7 dev-libs/mpfr-3.1.2 p10 x11-libs/libX11-1.6.2 sys-apps/busybox-1.23.0-r1 sys-process/psmisc-22.21-r2 virtual/pager-0 sys-devel/gcc-config-1.8 net-misc/netifrc-0.3.1 x11-libs/libXext-1.3.3 sys-libs/timezone-data-2014j dev-libs/popt-1.16-r2 x11-libs/libXfixes-5.0.1 app-misc/editor-wrapper-4 sys-devel/binutils-config-4-r1 x11-libs/libXt-1.1.4 net-firewall/iptables-1.4.21-r1 virtual/libffi-3.0.13-r1 x11-libs/fltk-1.3.3-r2 sys-libs/e2fsprogs-libs-1.42.12 sys-libs/cracklib-2.9.2 x11-libs/libXi-1.7.4 dev-libs/libpipeline-1.4.0 sys-apps/kmod-19 x11-libs/libXtst-1.2.2 sys-libs/gdbm-1.11 sys-devel/make-4.1-r1 net-misc/tigervnc-1.3.1-r2 app-portage/portage-utils-0.53 sys-process/procps-3.3.10-r1 dev-lang/perl-5.20.1-r4 sys-apps/sandbox-2.6-r1 sys-apps/iproute2-3.18.0 app-admin/perl-cleaner-2.19 app-misc/pax-utils-0.9.2 virtual/dev-manager-0 perl-core/Data-Dumper-2.154.0
    [Show full text]
  • I Deb, You Deb, Everybody Debs: Debian Packaging For
    I deb, you deb, everybody debs Debian packaging for beginners and experts alike Ondřej Surý • [email protected][email protected] • 25.­ 10. 2017 Contents ● .deb binary package structure ● Source package structure ● Basic toolchain ● Recommended toolchain ● Keeping the sources in git ● Clean build environment ● Misc... ● So how do I become Debian Developer? My Debian portfolio (since 2000) ● Mostly team maintained ● BIRD ● PHP + PECL (pkg-php) ● Cyrus SASL ○ Co-installable packages since 7.x ● Cyrus IMAPD ○ Longest serving PHP maintainer in Debian ● Apache2 + mod_md (fresh) (and still not crazy) ● ...other little stuff ● libjpeg-turbo ○ Transitioned Debian from IIJ JPEG (that Older work crazy guy) to libjpeg-turbo ● DNS Packaging Group ● GTK/GNOME/Freedesktop ○ CZ.NIC’s Knot DNS and Knot Resolver ● Redmine/Ruby ○ NLnet Lab’s NSD, Unbound, getdns, ldns Never again, it’s a straight road to madness ○ PowerDNS ○ ○ BIND 9 ● Berkeley DB and LMDB (pkg-db) ○ One Berkeley DB per release (yay!) Binary package structure ● ar archive consisting of: $ ar xv knot_2.0.1-4_amd64.deb x – debian-binary ○ debian-binary x – control.tar.gz x – data.tar.xz ■ .deb format version (2.0) ○ control.tar.gz $ dpkg-deb -X knot_2.0.1-4_amd64.deb output/ ./ ■ Package informatio (control) ./etc/ ■ Maintainer scripts […] ./usr/sbin/knotd ● {pre,post}{inst,rm} […] Misc (md5sum, conffiles) ■ $ dpkg-deb -e knot_2.0.1-4_amd64.deb DEBIAN/ ○ data.tar.xz $ ls DEBIAN/ conffiles control md5sums postinst postrm preinst ■ Actual content of the package prerm ■ This is what gets installed $ dpkg -I knot_2.0.1-4_amd64.deb ● Nástroje pro práci s .deb soubory new debian package, version 2.0.
    [Show full text]
  • An Improved Objective Metric to Predict Image Quality Using Deep Neural Networks
    https://doi.org/10.2352/ISSN.2470-1173.2019.12.HVEI-214 © 2019, Society for Imaging Science and Technology An Improved Objective Metric to Predict Image Quality using Deep Neural Networks Pinar Akyazi and Touradj Ebrahimi; Multimedia Signal Processing Group (MMSPG); Ecole Polytechnique Fed´ erale´ de Lausanne; CH 1015, Lausanne, Switzerland Abstract ages in a full reference (FR) framework, i.e. when the reference Objective quality assessment of compressed images is very image is available, is a difference-based metric called the peak useful in many applications. In this paper we present an objec- signal to noise ratio (PSNR). PSNR and its derivatives do not tive quality metric that is better tuned to evaluate the quality of consider models based on the human visual system (HVS) and images distorted by compression artifacts. A deep convolutional therefore often result in low correlations with subjective quality neural networks is used to extract features from a reference im- ratings. [1]. Metrics such as structural similarity index (SSIM) age and its distorted version. Selected features have both spatial [2], multi-scale structural similarity index (MS-SSIM) [3], feature and spectral characteristics providing substantial information on similarity index (FSIM) [4] and visual information fidelity (VIF) perceived quality. These features are extracted from numerous [5] use models motivated by HVS and natural scenes statistics, randomly selected patches from images and overall image qual- resulting in better correlations with viewers’ opinion. ity is computed as a weighted sum of patch scores, where weights Numerous machine learning based objective quality metrics are learned during training. The model parameters are initialized have been reported in the literature.
    [Show full text]
  • An Optimization of JPEG-LS Using an Efficient and Low-Complexity
    Received April 26th, 2021. Revised June 27th, 2021. Accepted July 23th, 2021. Digital Object Identifier 10.1109/ACCESS.2021.3100747 LOCO-ANS: An Optimization of JPEG-LS Using an Efficient and Low-Complexity Coder Based on ANS TOBÍAS ALONSO , GUSTAVO SUTTER , AND JORGE E. LÓPEZ DE VERGARA High Performance Computing and Networking Research Group, Escuela Politécnica Superior, Universidad Autónoma de Madrid, Spain. {tobias.alonso, gustavo.sutter, jorge.lopez_vergara}@uam.es This work was supported in part by the Spanish Research Agency under the project AgileMon (AEI PID2019-104451RB-C21). ABSTRACT Near-lossless compression is a generalization of lossless compression, where the codec user is able to set the maximum absolute difference (the error tolerance) between the values of an original pixel and the decoded one. This enables higher compression ratios, while still allowing the control of the bounds of the quantization errors in the space domain. This feature makes them attractive for applications where a high degree of certainty is required. The JPEG-LS lossless and near-lossless image compression standard combines a good compression ratio with a low computational complexity, which makes it very suitable for scenarios with strong restrictions, common in embedded systems. However, our analysis shows great coding efficiency improvement potential, especially for lower entropy distributions, more common in near-lossless. In this work, we propose enhancements to the JPEG-LS standard, aimed at improving its coding efficiency at a low computational overhead, particularly for hardware implementations. The main contribution is a low complexity and efficient coder, based on Tabled Asymmetric Numeral Systems (tANS), well suited for a wide range of entropy sources and with simple hardware implementation.
    [Show full text]
  • LIBJPEG the Independent JPEG Group's JPEG Software README for Release 6B of 27-Mar-1998 This Distribution Contains the Sixth
    LIBJPEG The Independent JPEG Group's JPEG software README for release 6b of 27-Mar-1998 This distribution contains the sixth public release of the Independent JPEG Group's free JPEG software. You are welcome to redistribute this software and to use it for any purpose, subject to the conditions under LEGAL ISSUES, below. Serious users of this software (particularly those incorporating it into larger programs) should contact IJG at [email protected] to be added to our electronic mailing list. Mailing list members are notified of updates and have a chance to participate in technical discussions, etc. This software is the work of Tom Lane, Philip Gladstone, Jim Boucher, Lee Crocker, Julian Minguillon, Luis Ortiz, George Phillips, Davide Rossi, Guido Vollbeding, Ge' Weijers, and other members of the Independent JPEG Group. IJG is not affiliated with the official ISO JPEG standards committee. DOCUMENTATION ROADMAP This file contains the following sections: OVERVIEW General description of JPEG and the IJG software. LEGAL ISSUES Copyright, lack of warranty, terms of distribution. REFERENCES Where to learn more about JPEG. ARCHIVE LOCATIONS Where to find newer versions of this software. RELATED SOFTWARE Other stuff you should get. FILE FORMAT WARS Software *not* to get. TO DO Plans for future IJG releases. Other documentation files in the distribution are: User documentation: install.doc How to configure and install the IJG software. usage.doc Usage instructions for cjpeg, djpeg, jpegtran, rdjpgcom, and wrjpgcom. *.1 Unix-style man pages for programs (same info as usage.doc). wizard.doc Advanced usage instructions for JPEG wizards only.
    [Show full text]
  • Compiling Your Own Perl
    APPENDIX A Compiling Your Own Perl Compiling Perl on a Unix-like system is simple. First, obtain the source for Perl from CPAN (dppl6++_l]j*lanh*knc+on_+NA=@IA*dpih). Then input the following sequence of commands: p]nvtrblanh)1*4*4*p]n*cv _`lanh)1*4*4 od?kjbecqna)`ao i]ga i]gapaop oq`ki]gaejop]hh On most Unix systems, this code will result in your lanh being installed into the +qon+ hk_]h+ directory tree. If you want it installed elsewhere—for example, in the local directory in your home directory—then replace od?kjbecqna)`a with the following: od?kjbecqna)`ao)@lnabet9z+hk_]h+ which should enable you to install Perl on your computer without root access. Note that the )`ao flag uses all the default options for compiling Perl. If you know that you want nonstandard configuration, just use the flag )`a instead to be prompted for your requirements. Be aware that the source for Perl 5.10.0 requires a patch to work properly with Catalyst. This is fixed in subsequent versions of Perl 5.10. If you need to test code guaranteed to run on a wide range of systems, you should con- sider using Perl version 5.8.7. Perl versions greater than 5.8.7 contain features that were not available in earlier versions of Perl, so Perl 5.8.7 is feature complete for all versions of Perl that Catalyst will run on (version 5.8.1 and later). Put another way, versions 5.8.8 and later have new features that you can’t rely on in earlier releases.
    [Show full text]
  • Daala: a Perceptually-Driven Still Picture Codec
    DAALA: A PERCEPTUALLY-DRIVEN STILL PICTURE CODEC Jean-Marc Valin, Nathan E. Egge, Thomas Daede, Timothy B. Terriberry, Christopher Montgomery Mozilla, Mountain View, CA, USA Xiph.Org Foundation ABSTRACT and vertical directions [1]. Also, DC coefficients are com- bined recursively using a Haar transform, up to the level of Daala is a new royalty-free video codec based on perceptually- 64x64 superblocks. driven coding techniques. We explore using its keyframe format for still picture coding and show how it has improved over the past year. We believe the technology used in Daala Multi-Symbol Entropy Coder could be the basis of an excellent, royalty-free image format. Most recent video codecs encode information using binary arithmetic coding, meaning that each symbol can only take 1. INTRODUCTION two values. The Daala range coder supports up to 16 values per symbol, making it possible to encode fewer symbols [6]. Daala is a royalty-free video codec designed to avoid tra- This is equivalent to coding up to four binary values in parallel ditional patent-encumbered techniques used in most cur- and reduces serial dependencies. rent video codecs. In this paper, we propose to use Daala’s keyframe format for still picture coding. In June 2015, Daala was compared to other still picture codecs at the 2015 Picture Perceptual Vector Quantization Coding Symposium (PCS) [1,2]. Since then many improve- Rather than use scalar quantization like the vast majority of ments were made to the bitstream to improve its quality. picture and video codecs, Daala is based on perceptual vector These include reduced overlap in the lapped transform, finer quantization (PVQ) [7].
    [Show full text]
  • Faster Neural Networks Straight from Jpeg
    Workshop track - ICLR 2018 FASTER NEURAL NETWORKS STRAIGHT FROM JPEG Lionel Gueguen & Alex Sergeev Rosanne Liu & Jason Yosinski Uber Technologies Uber AI Labs San Francisco, CA 94103, USA San Francisco, CA 94103, USA flgueguen, [email protected] frosanne, [email protected] ABSTRACT Training CNNs directly from RGB pixels has enjoyed overwhelming empirical success. But can more performance be squeezed out of networks by using different input representations? In this paper we propose and explore a simple idea: train CNNs directly on the blockwise discrete cosine transform (DCT) coefficients computed and available in the middle of the JPEG codec. We modify libjpeg to produce DCT coefficients directly, modify a ResNet-50 network to accommodate the differently sized and strided input, and evaluate performance on ImageNet. We find networks that are both faster and more accurate, as well as networks with about the same accuracy but 1.77x faster than ResNet-50. 1 INTRODUCTION Progresses toward training convolutional neural networks on a variety of tasks (Krizhevsky et al., 2012; Mnih et al., 2013; Ren et al., 2015; He et al., 2015) has led to the widespread adoption of such models in both academia and industry. Traditionally CNNs are trained with input provided as an array of red-green-blue (RGB) pixels. In this paper we propose and explore a simple idea for accelerating neural network training and inference where networks are applied to images encoded in the JPEG format. We modify the libjpeg library to decode JPEG images only partially, resulting in an image representation consisting of a triple of tensors containing discrete cosine transform (DCT) coefficients in the YCbCr color space.
    [Show full text]
  • Faster Neural Networks Straight from JPEG
    Faster Neural Networks Straight from JPEG Lionel Gueguen1 Alex Sergeev1 Ben Kadlec1 Rosanne Liu2 Jason Yosinski2 1Uber 2Uber AI Labs flgueguen,asergeev,bkadlec,rosanne,[email protected] Abstract The simple, elegant approach of training convolutional neural networks (CNNs) directly from RGB pixels has enjoyed overwhelming empirical success. But could more performance be squeezed out of networks by using different input representations? In this paper we propose and explore a simple idea: train CNNs directly on the blockwise discrete cosine transform (DCT) coefficients computed and available in the middle of the JPEG codec. Intuitively, when processing JPEG images using CNNs, it seems unnecessary to decompress a blockwise frequency representation to an expanded pixel representation, shuffle it from CPU to GPU, and then process it with a CNN that will learn something similar to a transform back to frequency representation in its first layers. Why not skip both steps and feed the frequency domain into the network directly? In this paper, we modify libjpeg to produce DCT coefficients directly, modify a ResNet-50 network to accommodate the differently sized and strided input, and evaluate performance on ImageNet. We find networks that are both faster and more accurate, as well as networks with about the same accuracy but 1.77x faster than ResNet-50. 1 Introduction The amazing progress toward training neural networks, particularly convolutional neural networks [14], to attain good performance on a variety of tasks [13, 19, 20, 10] has led to the widespread adoption of such models in both academia and industry. When CNNs are trained using image data as input, data is most often provided as an array of red-green-blue (RGB) pixels.
    [Show full text]