Accelerating Spherical Harmonic Transforms on the NVIDIA GPU
Total Page:16
File Type:pdf, Size:1020Kb
ECE 734
Project Proposal
Accelerating Spherical Harmonic Transforms on the NVIDIA® GPU
Vikrant Soman
Abstract The governing equations for global spectral weather model are derived from the conservation laws of mass, momentum, and energy. Vorticity, divergence, temperature, surface pressure and moisture equations are the main constituents of it. Expansion of the global field is done using spherical harmonics. Thus, spherical harmonic transform is a critical computational kernel of the dynamics algorithms for numerical weather prediction and climate modeling. The spherical harmonic transform is used to project grid point data on the sphere onto the spectral modes in an analysis step and an inverse transform reconstructs grid point data from the spectral information in a synthesis step. As atmospheric models push towards higher resolutions it has become necessary to accelerate this computationally intensive transform. This project aims to take a step in this direction by exploiting the recent technology of NVIDIA’s Graphics Processors Unit (GPU) which is a SIMD type architecture.
Background and Prior Work
The task of parallelizing the computation of spherical harmonic transforms is not a new art. Infact climate and weather modelers were one of the first users of parallel computers. The calculation of the spherical harmonic transform, is a two step process and methods mentioned in [2] and [4] take advantage of this and parallelize each of the steps. Other methods are on the lines of approximating the original transform equations in a Fourier basis since FFT implementations are computationally quicker. The algorithm explained in [1] gives a matrix implementation of the spherical harmonic transform which essentially makes the transform vectorizable. This is of interest in the context of the project since the CUDA provides some tremendous acceleration where matrix operations are concerned.
Project Overview
a. Approach
In this project I would be using the spherical harmonic transform algorithm mentioned in [1] for the CUDA implementation. The project will involve profiling the codes for forward spherical harmonic transform called analysis and reverse spherical harmonic transform called synthesis. Once the loops and operations taking the most time are found the next step would be redesigning the loops and matrix computations and map it to the GPU architecture to exploit the parallelism the GPU can provide. Thus the effective implementation I propose is a combined CPU + GPU, which I feel would give advantage of offloading the multidata single operation parts of the code to the GPU while the CPU can do effective computations on returned data. In the end the optimized code will be compared with the original code in performing tasks like power spectrum, energy spectrum computation and the gain in terms of computation time will be noted. Also the error in forward and reverse transform between the original code versus a CPU/GPU implementable code will be studied. Effect of single precision and double precision will also be studied in the project. b. Tasks and Milestones
1. Profile the algorithm mentioned in [1]. Find out the time required for the spherical harmonic transform on CPU and for various grid sizes. 2. Find the various areas for optimization. 3. Read about the GPU architecture and study CUDA. Implement simple programs for understanding the nuances of CUDA programs. 4. Redesign the matrix operations and loops that can be implemented in parallel keeping in mind the GPU architecture. 5. Write the codes for performing the operations marked for offloading to the GPU for parallel implementation. 6. Combine CPU matlab code and GPU code using the Matlab plug in for GPU. Resolve issues and run codes for various grid sizes. Observe effect for single precision versus double precision. Compute the error graphs between the forward and reverse transform values using the CPU code versus the CPU/GPU code. 7. Compare the values obtained in step [1]. Finally use both codes for finding the energy spectrum, power spectrum and observe gain in terms of computation time.
c. Expected Results
I expect that the forward and reverse transform should show significant gains in the computation time required based on the fact that the GPU will parallelize the matrix vector computations.
References
1. Drake, J. B., Worley, P., and D’Azevedo, E. 2008. Algorithm 888: Spherical harmonic transform algorithms. ACM Trans. Math. Softw. 35, 3, Article 23 (October 2008)
2. Akshara Kaginalkar, Sharad Purohit, Benchmarking of Medium Range Weather Forecasting Model on PARAM -A parallel machine, Center for Development of Advanced Computing (C-DAC), Pune University Campus, Pune 411007 India
3. Martin J. Mohlenkamp, A Fast Transform for Spherical Harmonics, The Journal of Fourier Analysis and Applications, 1999
4. Huadong Xiao, Yang Lu, Parallel computation for spherical harmonic synthesis and analysis, Computers & Geosciences, Volume 33, Issue 3, March 2007
5. NVIDIA CUDA Programming Guide