Real-Time Layered Video Compression Using SIMD

RealTime Layered Video Compression using SIMD Computation Morten Vadskr Jensen and Brian Nielsen Aalb org University Department of Computer Science Fredrik Ba jersvej E DK Aalb org Denmark fmvj bnielsengcsaucdk Abstract We present the design and implementation of a high p erfor mance software layered video codec designed for deployment in band width heterogeneous networks The co dec facilitates layered spatial and SNR signaltonoise ratio co ding for bitrate adaption to a wide range of receiver capabilities The co dec uses a wavelet subband decomp osi tion for spatial layering and a discrete cosine transform combined with rep eated quantization for SNR layering Through the use of the Visual Instruction Set on SUNs UltraSPARC platform we demonstrate howSIMDparallel image pro cessing enables layered realtime software enco ding and deco ding The co dec partitions ers at a sp eed of our bit test video stream into lay frames per second and reconstructed at frames p er second The Visual Instruction Set accelerated enco der stages are ab out times as fast as an optimized C version We nd that this sp eedup is well worth the extra implementation eort Intro duction Smart technology for multicasting live digital video across wide area networks is necessary for future applications like video conferencing distance learning and telecommuting The Internet Multicast Backb one MBone is already p opu lar and allows p eople from anywhere on the planet to exchange mo dest quality video and audio signals However a fundamental problem with nearly all large computer networks is that network capacity bandwidth varies extremely from one network segment to another This variation is the source of a serious video multicast problem All users wish to participate in the video conference with the highest quality video their connection capacity varying between kbps for ISDN dialup connections to Mbits LAN connections and host compu tational resources allow High capacity receivers are capable of pro cessing go o d this outp erforms the low quality video with bitrates of Mbps or more but capacity receivers by more than an order of magnitude Even a mo dest bit rate mayoverood low capacity receivers Conventional video compression techniques co de the video signal to a xed target bit rate and is then multicasted to all receivers A low target bit rate is typically chosen to enable as many receivers as p ossible to participate How ever the resulting quality is unacceptable for the high capacity receivers Al ternatively the video signal can b e enco ded rep eatedly to a multitude of data rates This is however very inecient Firstlyitinvolves extra computational overhead from compressing the same frame rep eatedly Secondly it intro duces bandwidth redundancy as each high bitrate video stream contains all the in formation also included in the lower bitrate ones This translates to inecient bandwidth utilization from the ro ot of the the multicast tree We prop ose layeredcoding coupled with multicasting where the video stream is co ded and compressed into several distinct layers of signicance as shown in Figure In principle eachlayer is transmitted on its own multicast channel The receivers may subscrib e to these channels according to their capabilities and preferences The more layers received the higher video quality but also higher wer requirements The most signicantlayer con bandwidth and pro cessing p o stitutes the base layer and the following layers enhancement layers Multicast Frame Encoding (a) Multicast Frame Encoding MSLs LSLs (b) Fig Conventional single bitrate video co ding a vs layered co ding b Live video requires that video frames are enco ded and deco ded in realtime This is even more challenging for layered co ding than for conventional co ding b ecause layer construction and reassembly requires use of additional image l ter functions and rep eated pro cessing of image data and hence requires more CPUpro cessing We b elieve that it is imp ortant that this co ding is p ossible in realtime on mo dern general purp ose pro cessors without dedicated external co dec hardwarepartly b ecause no current hardware unit has the functionality we advertise for but more imp ortantly b ecause applications in the near future will integrate video as a normal data typ e and manipulate it in application de p endent manner This require signicant exibility Fortunately most mo dern CPUarchitectures have b een extended with SIMD instructions to sp eed up dig ital signal pro cessing Examples include VIS in Suns UltraSPARC MMX in Intels PentiumI I MAX in HewlettPackards PARISC MDMX in Silicon Graphics MIPS MVI in Digitals Alpha and recently Altivec in Motorolas PowerPC CPUs In this pap er we show a high p erformance implementation of a layered co dec capable of constructing a large set of layers from reasonably sized testvideo in realtime We demonstrate how SIMD parallelism and careful considerations of sup erscalar pro cessing can sp eedup image pro cessing signicantly Our enco der implementation exists in both an optimized Cversion and in aversion almost exclusively using SUN microsystems Visual Instruction Set VIS available on the SUN UltraSPARC platform The deco der only exists in a VISaccelerated version The remainder of the pap er is structured as follows Section presents our layered co ding scheme and our co dec mo del which incorp orates it Section presents the implementation of an instance of our co dec mo del along with our p erformance optimization eorts Section evaluates the co ding scheme through a series of measurements Finally section discusses related work and section concludes The Co dec Mo del Our co dec combines two indep endent layering facilities spatial layering and signalnoiseratio SNR layering Spatial layering controls the resolution of the imagesubscription to more spatial layers means higher image resolution SNR t image pixelsthe more layering controls the number of bits used to represen bits the higher pixel precision and thus a more sharp and detailed image The wavelet lter p erforms a subband decomp osition of the image and decom p oses it into one low L and one high frequency H subband p er dimension The eect of subband decomp osition on the Lenaimage is depicted in Figure The LL subband is a subsampled and ltered version of the original image The three subbands HL LH and HH contain the high frequency information necessary to reconstruct the horizontal vertical and diagonal resolution resp ectively There fore the HL LH and HH layers act as renement or enhancement layers for the LL layer Further the LL subband can itself b e sub ject to further subband decomp ositions or can be sub jected to the familiar blo ckbased DCT co ding scheme known from other co decs for further psychovisual enhanced compres sion We use a Dversion of the tap spline wavelet p b ecause it achieves good pixel value approximation and b ecause it is highly summetric and consequently can b e implemented very eciently SNR layering divides the incoming co ecients into several levels of signif icance whereby the lower levels include the most signicant information re sulting in a coarse image representation This is then progressively rened with the number of layers SNR layering is a exible and computationally ecient way of reducing the video stream bitrate and allows negrain control of im age quality at dierentlayers In our co dec SNRlayering is achieved through a DCTtransform combined with rep eated uniform quantization Figure shows our co dec mo del Enco ding is p erformed by the following comp onents The Wavelet Transformer transforms YUV images into LL HL LH and HH layers corresp onding to four spatial layers The enhancementlayers HL LH HH are sent directly to the quantizer while the LL layer is passed on to ansformer p erforms the Discrete Cosine Transform the DCT stage The DCT tr on the LL image from ab ove The resulting co ecientblocks are passed on to the LL HL LH HH Fig Wavelet transformation of the Lena image into LL HL LH HH layers The LL layer is a downsampled version of the original image and the HL LH and HH layers contain the information required to restore horizontal vertical and diagonal resolution resp ectively quantization stage The Quantizer splits incoming co ecients into several layers of signicance The base layer contains a coarse version of the co ecients and eachlayer adds precision to co ecient resolution The Coder takes the quantized co ecients from each separate layer and Human compresses them to pro duce a single bitstream for eachlayer Encoding stages Wavelet LL DCT DCT Quantizer Coder Sender Transformer Transformer HL LH HH HL LH HH Σ Inverse Inverse Receiver Decoder IDCT DCT LL Wavelet Transformer Transformer DeQuantizer Decoding stages Legend: Wavelet DCT Frame Frame LL transformed DCT transformed Layer frame frame remainder Fig Our co dec mo del In the enco ding stages the wavelet transformer sends the LL layer to the DCT stage b efore all layers go through the quantizer and on to transmis sion The quantizer reuses co ecient remainders as depicted by the arrowreentering the quantizer The deco ding stages essentially p erform the inverse op erations The Decoder uncompresses the received Human co des The Dequantizer reconstructs co ef cients based on the deco ded layers the dequantizer works b oth as co ecient reconstructor and as layer merger Like the enco der there are two dierent de quantizers one for DCT co ecients which are passed on to the IDCT stage and one for wavelet co ecients which are passed directly on to

Load more