Discrete Sine and Cosine Transform and Helmholtz Equation Solver on GPU

Total Page:16

File Type:pdf, Size:1020Kb

Discrete Sine and Cosine Transform and Helmholtz Equation Solver on GPU 2020 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking (ISPA/BDCloud/SocialCom/SustainCom) Discrete Sine and Cosine Transform and Helmholtz Equation Solver on GPU Mingming Ren, Yuyang Gao∗, Gang Wang, Xiaoguang Liu TJ Key Lab of NDST, College of Computer Science, Nankai University, Tianjin, China {renmingming, gaoyy, wgzwp, liuxg}@nbjl.nankai.edu.cn Abstract—Helmholtz equation is a special kind of elliptic hardware and software technology, there are quite a lot of partial differential equation. Solving Helmholtz equation is often resources available for discrete signal processing. For example, needed in many scientific and engineering problems. The efficient the calculation of discrete-time Fourier series of periodic approach to solving Helmholtz equation is through using Fast Fourier Transform (FFT). In practice, boundary conditions must signals (i.e. Discrete Fourier transform, DFT [9]) is usually be considered, and several discrete Fourier transforms such as computed by Fast Fourier Transform (FFT). Many famous Discrete Sine and Cosine Transforms (DST, DCT) are needed for software packages such as MATLAB, Intel MKL and FFTW solving problems with different boundary conditions. [10] provide corresponding implementations. Nowadays, with the development of Compute Unified De- Using Fourier based method to solve Helmholtz equation vice Architecture (CUDA) technology in recent years, many researchers use Graphics Processing Units (GPUs) to accelerate can handle not only the periodic boundary conditions, but their programs. In view of the importance of FFT, NVIDIA also the other different kinds of boundary conditions, such officially provides the library cuFFT, but it does not include as Dirichlet or Neumann boundary conditions. By using the the calculation of DST and DCT, which brings inconvenience to finite difference method, the solution can be obtained through users. varieties of DFT, such as Discrete Sine Transform (DST) and In this paper, we present cuHelmholtz which is designed and implemented based on CUDA. It can be used to compute Discrete Cosine Transform (DCT) [9], [11]. Different variants several kinds of three-dimensional DST and DCT, and to solve of DFT correspond to different homogeneous boundary con- three-dimensional Helmholtz equations with various boundary ditions. Furthermore, non-homogeneous boundary conditions conditions. The experimental results show that compared with can be transformed into homogeneous boundary conditions. It FISHPACK, a FORTRAN software package which can be used is worth mentioning that the type 2 DCT is widely used in to solve three-dimensional Helmholtz equation, we achieve about 10× to 30× speedup. data compression, image processing and video coding [12], Index Terms—Helmholtz equation, CUDA, GPU, Discrete Sine [13], and it is called “the” DCT in these fields. Transform, Discrete Cosine Transform Since Nvidia Corporation launched Compute Unified De- vice Architecture (CUDA), Graphics Processing Units (GPUs) I. INTRODUCTION have been playing an increasingly important role in the field of Helmholtz equation is a special kind of elliptic partial scientific and engineering computing in the past decade. Many differential equation. In the process of solving many mathe- studies have used GPUs to optimize the computation process matical and physical problems (such as solving wave equation, in order to reduce the run time. Nvidia also provides a library potential distribution in electrostatics and incompressible flow named cuFFT for computing FFT. However, the library does problem [1], [2], etc.) one may need to solve Helmholtz not contain the calculation of DST and DCT, which makes it equation (or its special cases, such as Laplace equation and inconvenient to solve Helmholtz equation with non-periodic Poisson equation). Furthermore, many problems need to be boundary conditions on GPU. solved iteratively, and in each iteration one may need to solve To the best of authors’ knowledge, there is no literature a Helmholtz equation [3], [4]. Therefore, it is helpful to solve to solve all kinds of common DST and DCT on GPU, nor Helmholtz equation as quickly as possible. to solve three-dimensional Helmholtz equation with several Fourier-related transform has important applications in solv- common boundary conditions on GPU. Here we introduce ing partial differential equations [5]–[7]. Many partial differ- some related works. Wu et al. [14], [15] computed solution ential equations including Helmholtz equations with constant of three-dimensional Possion equation on GPU platform for coefficients can be solved efficiently by Fourier-related trans- one Neumann boundary condition (the other two are still forms. periodic boundary conditions). In [16], Ghetia et al. computed Fourier-related transform has also been widely used in type 2 DCT on GPU platform in two-dimensional case. Their many other fields [8]. According to whether the signal is method requires extra GPU memory and is not suitable for the continuous or discrete in time, periodic or non-periodic, it calculation of large-scale three-dimensional transforms. can be divided into four categories: continuous time Fourier In this paper we design and implement algorithms for transform, discrete time Fourier transform, continuous time computing several kinds of three-dimensional DST and DCT Fourier series and discrete time Fourier series. Thanks to the on GPU, and use them to solve three-dimensional Helmholtz rapid development of digital technology, especially computer equations with various boundary conditions. The code library 978-0-7381-3199-3/20/$31.00 ©2020 IEEE 57 DOI 10.1109/ISPA-BDCloud-SocialCom-SustainCom51426.2020.00034 we implemented is named cuHelmholtz. The main organi- plan, which corresponds to different data layouts and batched zational structure of this paper is as follows. We briefly executions. Next, we can call execution functions such as introduce some basics of CUDA and cuFFT in Section II. cufftExecR2C() to do corresponding transformations (the Section III describes how to solve Helmholtz equation by finite data layout was specified when creating plan). The plan can be difference method and Fourier transform method. Section IV used to transform different data several times. When the plan describes the design and implementation of cuHelmholtz. In is no longer needed, the function cufftDestroy() can be Section V, we test and discuss the accuracy and performance called to release the relevant resources. of cuHelmholtz. Section VI is a brief summary. III. SOLUTION OF HELMHOLTZ EQUATIONS II. CUDA PROGRAMMING MODEL AND CUFFT The three-dimensional Helmholtz equation in Cartesian A. CUDA Programming Model coordinate is CUDA makes GPU very easy to use in general scientific and u x, y, z λu x, y, z f x, y, z engineering computation. In the CUDA programming model, Δ ( )+ ( )= ( ) (1) CPU is host, and GPU is co-processor or device.Ifwelaunch where u is an unknown function, λ is a constant value, the a kernel function on the GPU (device) side, it will be executed source term f is a known function, Δ is the Laplace operator by a large number of GPU threads, which are organized into ∂2 ∂2 ∂2 ∂x2 + ∂y2 + ∂z2 . Function u and f are defined on a rectangular GPU thread blocks. The thread blocks form a grid. The threads area xs ≤ x ≤ xf ,ys ≤ y ≤ yf ,zs ≤ z ≤ zf . λ is generally are scheduled by the GPU thread scheduler to run on some less than or equal to 0. For λ>0 and for λ =0in some cases, Streaming Multiprocessors (SMs) in the unit of warps, which the Helmholtz equation may not have a solution. If λ =0,it are successive 32 threads. is called Poisson equation, and if further f equals 0, it is a Host and device have separate memory space. GPU threads Laplace’s equation. can access data from different types of device memory (includ- In order to numerically solve the Helmholtz equation, we ing global memory, shared memory, register, constant memory, use the finite difference method. We put a mesh in the etc.), of which global memory is the largest and slowest. solution area. The mesh has Nx,Ny,Nz mesh points in x, y, z Global memory is usually used to store intermediate results, directions respectively. Our purpose is to find the u value at the which are generated by previous kernel launch and consumed mesh points, that is, we use the value of the function on some by subsequent kernel calls. Shared memory is an on-chip discrete mesh points to represent the function. The denser the memory that is shared between threads of the same thread mesh (by increasing Nx,Ny,Nz), the more accurate the solu- block. Registers are also on-chip memory and allocated by tion we can usually get. We denote function value u(x, y, z) each thread. Constant memory can only store constant values, at mesh point (xs +iΔx,ys +jΔy,zs +kΔz) as ui,j,k, where and the values in constant memory will be cached. xf −xs x, y, z x y Δ Δ Δ are the grid widths, i.e. Δ = Nx , Δ = Shared memory is often used in optimization for memory yf −ys zf −zs ,Δz = . The discretized Helmholtz equation can constrained CUDA applications. In the current hardware speci- Ny Nz be obtained by using the 7-point central difference scheme as fication, shared memory usually has 32 banks. When accessing follow shared memory, bank conflicts need to be considered. Bank u − u u u − u u conflict occurs if two threads in the same warp access different i+1,j,k 2 i,j,k + i−1,j,k i,j+1,k 2 i,j,k + i,j−1,k 2 + 2 addresses in the same bank, then the shared memory access (Δx) (Δy) u − u u requests have to be serially performed, and it will cause i,j,k+1 2 i,j,k + i,j,k−1 λu f + 2 + i,j,k = i,j,k performance degradation.
Recommended publications
  • CALIFORNIA STATE UNIVERSITY, NORTHRIDGE Optimized AV1 Inter
    CALIFORNIA STATE UNIVERSITY, NORTHRIDGE Optimized AV1 Inter Prediction using Binary classification techniques A graduate project submitted in partial fulfillment of the requirements for the degree of Master of Science in Software Engineering by Alex Kit Romero May 2020 The graduate project of Alex Kit Romero is approved: ____________________________________ ____________ Dr. Katya Mkrtchyan Date ____________________________________ ____________ Dr. Kyle Dewey Date ____________________________________ ____________ Dr. John J. Noga, Chair Date California State University, Northridge ii Dedication This project is dedicated to all of the Computer Science professors that I have come in contact with other the years who have inspired and encouraged me to pursue a career in computer science. The words and wisdom of these professors are what pushed me to try harder and accomplish more than I ever thought possible. I would like to give a big thanks to the open source community and my fellow cohort of computer science co-workers for always being there with answers to my numerous questions and inquiries. Without their guidance and expertise, I could not have been successful. Lastly, I would like to thank my friends and family who have supported and uplifted me throughout the years. Thank you for believing in me and always telling me to never give up. iii Table of Contents Signature Page ................................................................................................................................ ii Dedication .....................................................................................................................................
    [Show full text]
  • Video Compression Optimized for Racing Drones
    Video compression optimized for racing drones Henrik Theolin Computer Science and Engineering, master's level 2018 Luleå University of Technology Department of Computer Science, Electrical and Space Engineering Video compression optimized for racing drones November 10, 2018 Preface To my wife and son always! Without you I'd never try to become smarter. Thanks to my supervisor Staffan Johansson at Neava for providing room, tools and the guidance needed to perform this thesis. To my examiner Rickard Nilsson for helping me focus on the task and reminding me of the time-limit to complete the report. i of ii Video compression optimized for racing drones November 10, 2018 Abstract This thesis is a report on the findings of different video coding tech- niques and their suitability for a low powered lightweight system mounted on a racing drone. Low latency, high consistency and a robust video stream is of the utmost importance. The literature consists of multiple comparisons and reports on the efficiency for the most commonly used video compression algorithms. These reports and findings are mainly not used on a low latency system but are testing in a laboratory environment with settings unusable for a real-time system. The literature that deals with low latency video streaming and network instability shows that only a limited set of each compression algorithms are available to ensure low complexity and no added delay to the coding process. The findings re- sulted in that AVC/H.264 was the most suited compression algorithm and more precise the x264 implementation was the most optimized to be able to perform well on the low powered system.
    [Show full text]
  • Answers to Exercises
    Answers to Exercises A bird does not sing because he has an answer, he sings because he has a song. —Chinese Proverb Intro.1: abstemious, abstentious, adventitious, annelidous, arsenious, arterious, face- tious, sacrilegious. Intro.2: When a software house has a popular product they tend to come up with new versions. A user can update an old version to a new one, and the update usually comes as a compressed file on a floppy disk. Over time the updates get bigger and, at a certain point, an update may not fit on a single floppy. This is why good compression is important in the case of software updates. The time it takes to compress and decompress the update is unimportant since these operations are typically done just once. Recently, software makers have taken to providing updates over the Internet, but even in such cases it is important to have small files because of the download times involved. 1.1: (1) ask a question, (2) absolutely necessary, (3) advance warning, (4) boiling hot, (5) climb up, (6) close scrutiny, (7) exactly the same, (8) free gift, (9) hot water heater, (10) my personal opinion, (11) newborn baby, (12) postponed until later, (13) unexpected surprise, (14) unsolved mysteries. 1.2: A reasonable way to use them is to code the five most-common strings in the text. Because irreversible text compression is a special-purpose method, the user may know what strings are common in any particular text to be compressed. The user may specify five such strings to the encoder, and they should also be written at the start of the output stream, for the decoder’s use.
    [Show full text]
  • An Overview of Emerging Video Coding Standards
    [STANDARDS] Ticao Zhang and Shiwen Mao Dept. Electrical & Computer Engineering, Auburn University Editors: Michelle X. Gong and Shiwen Mao AN OVERVIEW OF EMERGING arious industrial studies and reports have predicted the drastic increase of video traffic VIDEO CODING in the Internet [1] and wireless networks [2]. Today’s popular video coding standards, such as H.264 Advanced video STANDARDS coding (AVC), are widely used in video storage and transmission systems. With the explosive growth of video traffic, it has been Today’s popular video coding standards, such as H.264/AVC, are widely recognized that video coding technology used to encode video into bit streams for storage and transmission. is crucial in providing a more engaging With the explosive growth of various video applications, H.264/AVC experience for users. For users that are often may not fully satisfy their requirements anymore. There is an increasing restricted by a limited data plan and dynamic wireless connections, high-efficiency video demand for high compression efficiency and low complexity video coding standards are essential to enhance coding standards. In this article, we provide an overview of existing their quality of experience (QoE). On the and emerging video coding standards. We review the timeline of the other hand, network service providers (NSP) development of the popular H.26X family video coding standards, and are constrained by the scarce and expensive wireless spectrum, making it challenging introduce several emerging video coding standards such as AV1, VP9 to support emerging data-intensive video and VVC. As for future video coding, considering the success of machine services, such as high-definition (HD) video, learning in various fields and hardware acceleration, we conclude this 4K ultra high definition (UHD), 360-degree article with a discussion of several future trends in video coding.
    [Show full text]
  • State of the Art and Future Trends in Data Reduction for High-Performance Computing
    DOI: 10.14529/jsfi200101 State of the Art and Future Trends in Data Reduction for High-Performance Computing Kira Duwe1, Jakob L¨uttgau1, Georgiana Mania2, Jannek Squar1, Anna Fuchs1, Michael Kuhn1, Eugen Betke3, Thomas Ludwig3 c The Authors 2020. This paper is published with open access at SuperFri.org Research into data reduction techniques has gained popularity in recent years as storage ca- pacity and performance become a growing concern. This survey paper provides an overview of leveraging points found in high-performance computing (HPC) systems and suitable mechanisms to reduce data volumes. We present the underlying theories and their application throughout the HPC stack and also discuss related hardware acceleration and reduction approaches. After intro- ducing relevant use-cases, an overview of modern lossless and lossy compression algorithms and their respective usage at the application and file system layer is given. In anticipation of their increasing relevance for adaptive and in situ approaches, dimensionality reduction techniques are summarized with a focus on non-linear feature extraction. Adaptive approaches and in situ com- pression algorithms and frameworks follow. The key stages and new opportunities to deduplication are covered next. An unconventional but promising method is recomputation, which is proposed at last. We conclude the survey with an outlook on future developments. Keywords: data reduction, lossless compression, lossy compression, dimensionality reduction, adaptive approaches, deduplication, in situ, recomputation, scientific data set. Introduction Breakthroughs in science are increasingly enabled by supercomputers and large scale data collection operations. Applications span the spectrum of scientific domains from fluid-dynamics in climate simulations and engineering, to particle simulations in astrophysics, quantum mechan- ics and molecular dynamics, to high-throughput computing in biology for genome sequencing and protein-folding.
    [Show full text]
  • Appendix a Information Theory
    Appendix A Information Theory This appendix serves as a brief introduction to information theory, the foundation of many techniques used in data compression. The two most important terms covered here are entropy and redundancy (see also Section 2.3 for an alternative discussion of these concepts). A.1 Information Theory Concepts We intuitively know what information is. We constantly receive and send information in the form of text, sound, and images. We also feel that information is an elusive nonmathematical quantity that cannot be precisely defined, captured, or measured. The standard dictionary definitions of information are (1) knowledge derived from study, experience, or instruction; (2) knowledge of a specific event or situation; intelligence; (3) a collection of facts or data; (4) the act of informing or the condition of being informed; communication of knowledge. Imagine a person who does not know what information is. Would those definitions make it clear to them? Unlikely. The importance of information theory is that it quantifies information. It shows how to measure information, so that we can answer the question “how much information is included in this piece of data?” with a precise number! Quantifying information is based on the observation that the information content of a message is equivalent to the amount of surprise in the message. If I tell you something that you already know (for example, “you and I work here”), I haven’t given you any information. If I tell you something new (for example, “we both received a raise”), I have given you some information. If I tell you something that really surprises you (for example, “only I received a raise”), I have given you more information, regardless of the number of words I have used, and of how you feel about my information.
    [Show full text]
  • Towards Prediction Optimality in Video Compression and Networking
    University of California Santa Barbara Towards Prediction Optimality in Video Compression and Networking A dissertation submitted in partial satisfaction of the requirements for the degree Doctor of Philosophy in Electrical and Computer Engineering by Shunyao Li Committee in charge: Professor Kenneth Rose, Chair Professor B. S. Manjunath Professor Shiv Chandrasekaran Professor Nikil Jayant March 2018 The Dissertation of Shunyao Li is approved. Professor B. S. Manjunath Professor Shiv Chandrasekaran Professor Nikil Jayant Professor Kenneth Rose, Committee Chair February 2018 Towards Prediction Optimality in Video Compression and Networking Copyright c 2018 by Shunyao Li iii To my beloved family iv Acknowledgements The completion of this thesis would not have been possible without the support from many people. I would like to thank my advisor, Prof. Kenneth Rose, for his guidance and support throughout the entire course of my doctoral research. He always encourages us to think about problems from the fundamental theory of signal processing, and to understand deeply the reasons behind the experiments and observations. He also gives us enough freedom and support so that we can explore the topics that we are interested in. I enjoy every constructive and inspiring discussion with him, and am genuinely grateful for developing a solid, theoretical and critical mindset under his supervision. I would also like to thank Prof. Manjunath, Prof. Chandrasekaran and Prof. Jayant for serving on my doctoral committee and providing constructive suggestions on my research. The research in this dissertation has been partly supported by Google Inc, LG Electronics and NSF. Thanks to our sponsors. I thank all my labmates for the collaborations and discussions.
    [Show full text]
  • Region Based X-Ray Image Retrieval Using Transform Techniques *Dr
    IJRECE VOL. 3 ISSUE 3 JULY-SEPT. 2015 ISSN: 2393-9028 (PRINT) | ISSN: 2348-2281 (ONLINE) Region Based X-Ray Image Retrieval using Transform Techniques *Dr. Ch. Raja *Associate Professor, Department of ECE, MGIT, Hyderabad, India. Abstract - Region Based Medical Image Retrieval (RBMIR) time of the search algorithm while extracting similar became popular aspect in Content Based Medical Image looking images from database [1].Therefore retrieval Retrieval (CBMIR) domain. Locally computed visual algorithm should compute optimal and minimal attributes attributes of a radiographic image has significant role in that better represent the images. This speeds up the retrieval retrieving the similar images from databases when process and results in faster extraction of similar images compared with globally computed features and this is one of from database. the prime aspects in making clinical decisions. Better X-Ray medical images are rich in textural representation, disease identification can be made by finding images of the compared to intensity and shape attributes, texture attributes same modality of an anatomic region. This obviously represent these images in a better way. According to the improves the healthcare system efficiency. X-Ray images survey of [5] and the work done in [2] textural features are best described in textural distributions compared to computed in transformed domain shown superior retrieval shape and intensity. The texture attributes computed from performance over spatially extracted attributes. Based on transformed domains shown superior performance over that in this work features extracted by transforming the spatially computed features. In this paper we proposed a image with Discrete Sine Transform (DST), Discrete Cosine novel approach for RBMIR by computing the region wise Transform (DCT), Discrete Fourier Transform (DFT) and visual attributes from four transformed domains viz.
    [Show full text]
  • Symmetric Trigonometric Transforms to Dual-Root Lattice Fourier–Weyl Transforms
    S S symmetry Article Connecting (Anti)Symmetric Trigonometric Transforms to Dual-Root Lattice Fourier–Weyl Transforms Adam Brus , JiˇríHrivnák and Lenka Motlochová * Department of Physics, Faculty of Nuclear Sciences and Physical Engineering, Czech Technical University in Prague, Bˇrehová7, 115 19 Prague, Czech Republic; adam.brus@fjfi.cvut.cz (A.B.); jiri.hrivnak@fjfi.cvut.cz (J.H.) * Correspondence: lenka.motlochova@fjfi.cvut.cz; Tel.: +420-22435-8352 Abstract: Explicit links of the multivariate discrete (anti)symmetric cosine and sine transforms with the generalized dual-root lattice Fourier–Weyl transforms are constructed. Exact identities between the (anti)symmetric trigonometric functions and Weyl orbit functions of the crystallographic root systems A1 and Cn are utilized to connect the kernels of the discrete transforms. The point and label sets of the 32 discrete (anti)symmetric trigonometric transforms are expressed as fragments of the rescaled dual root and weight lattices inside the closures of Weyl alcoves. A case-by-case analysis of the inherent extended Coxeter–Dynkin diagrams specifically relates the weight and normalization functions of the discrete transforms. The resulting unique coupling of the transforms is achieved by detailing a common form of the associated unitary transform matrices. The direct evaluation of the corresponding unitary transform matrices is exemplified for several cases of the bivariate transforms. Keywords: Weyl orbit function; root lattice; symmetric trigonometric function; discrete trigonometric transform 1. Introduction Citation: Brus, A.; Hrivnák, J.; The purpose of this article is to develop an explicit link between the multivari- Motlochová, L. Connecting ate discrete (anti)symmetric trigonometric transforms [1,2] and generalized dual-root (Anti)symmetric Trigonometric lattice Fourier–Weyl transforms [3,4].
    [Show full text]
  • DCT-Only and DST-Only WINDOWED UPDATE ALGORITHMS
    DISCRETE COSINE TRANSFORM-only AND DISCRETE SINE TRANSFORM-only WINDOWED UPDATE ALGORITHMS FOR SHIFTING DATA WITH HARDWARE IMPLEMENTATION by Vikram Karwal A dissertation submitted to the faculty of The University of North Carolina at Charlotte in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Electrical Engineering Charlotte 2009 Approved by: ____________________________________ Dr. Yogendra P. Kakad ____________________________________ Dr. Barry G. Sherlock ____________________________________ Dr. Farid M. Tranjan ____________________________________ Dr. Stephen M. Bobbio ____________________________________ Dr. Bharat Joshi ____________________________________ Dr. Isaac M. Sonin ii © 2009 Vikram Karwal ALL RIGHTS RESERVED iii ABSTRACT VIKRAM KARWAL. Discrete cosine transform-only and discrete sine transform-only windowed update algorithms for shifting data with hardware implementation. (Under the direction of DR. YOGENDRA P. KAKAD and DR. BARRY G. SHERLOCK) Discrete Cosine Transform (DCT) and Discrete Sine Transform (DST) are widely used in image and data compression applications. To process the DCT or DST of a signal a portion of length N is extracted by windowing. By shifting the window point by point the entire signal can be processed. The algorithms are developed that are capable of updating the DCT and DST independently to reflect the modified window contents i.e. for calculating the DCT of the shifted sequence no DST coefficients are used and similarly for calculating the DST of the shifted sequence no DCT coefficients are used. These algorithms constitute an improvement over previous DCT/DST update algorithms as it establishes independence between the DCT and the DST. The update algorithms used to calculate the transform of the shifted sequence uses less computation as compared to directly evaluating the modified transform via standard fast transform algorithms.
    [Show full text]
  • An Optimized Template Matching Approach to Intra Coding in Video/Image Compression
    An Optimized Template Matching Approach to Intra Coding in Video/Image Compression Hui Su, Jingning Han, and Yaowu Xu Chrome Media, Google Inc., 1950 Charleston Road, Mountain View, CA 94043 ABSTRACT The template matching prediction is an established approach to intra-frame coding that makes use of previously coded pixels in the same frame for reference. It compares the previously reconstructed upper and left boundaries in searching from the reference area the best matched block for prediction, and hence eliminates the need of sending additional information to reproduce the same prediction at decoder. In viewing the image signal as an auto-regressive model, this work is premised on the fact that pixels closer to the known block boundary are better predicted than those far apart. It significantly extends the scope of the template matching approach, which is typically followed by a conventional discrete cosine transform (DCT) for the prediction residuals, by employing an asymmetric discrete sine transform (ADST), whose basis functions vanish at the prediction boundary and reach maximum magnitude at far end, to fully exploit statistics of the residual signals. It was experimentally shown that the proposed scheme provides substantial coding performance gains on top of the conventional template matching method over the baseline. Keywords: Template matching, Intra prediction, Transform coding, Asymmetric discrete sine transform 1. INTRODUCTION Intra-frame coding is a key component in video/image compression system. It predicts from previously recon- structed neighboring pixels to largely remove spatial redundancies. A codec typically allows various prediction directions1{3 , and the encoder selects the one that best describes the texture patterns (and hence rendering minimal rate-distortion cost) for block coding.
    [Show full text]
  • On the Distribution of the Sample Autocorrelation Coefficients
    On the distribution of the sample autocorrelation coefficients Raymond Kana,*, Xiaolu Wangb aJoseph L. Rotman School of Management, University of Toronto, Toronto, Ontario, Canada M5S 3E6 bJoseph L. Rotman School of Management, University of Toronto, Toronto, Ontario, Canada M5S 3E6 Abstract Sample autocorrelation coefficients are widely used to test the randomness of a time series. Despite its unsatisfactory performance, the asymptotic normal distribution is often used to approximate the distribution of the sample autocorrelation coefficients. This is mainly due to the lack of an efficient approach in obtaining the exact distribution of sample autocorre- lation coefficients. In this paper, we provide an efficient algorithm for evaluating the exact distribution of the sample autocorrelation coefficients. Under the multivariate elliptical dis- tribution assumption, the exact distribution as well as exact moments and joint moments of sample autocorrelation coefficients are presented. In addition, the exact mean and variance of various autocorrelation-based tests are provided. Actual size properties of the Box-Pierce and Ljung-Box tests are investigated, and they are shown to be poor when the number of lags is moderately large relative to the sample size. Using the exact mean and variance of the Box-Pierce test statistic, we propose an adjusted Box-Pierce test that has a far superior size property than the traditional Box-Pierce and Ljung-Box tests. JEL Classification: C13, C16 Keywords: Sample autocorrelation coefficient; Finite sample distribution; Rank one update *Corresponding author. Joseph L. Rotman School of Management, University of Toronto, Toronto, Ontario M5S 3E6, Canada, Phone: (416)978-4291, Fax: (416)978-5433. E-mail address: [email protected] 1.
    [Show full text]