Error Resilience and Concealment for H

THESIS PROPOSAL

Vineeth Shetty Kolkeri UTA ID-1000578225 TABLE OF ACRONYMS

B-slice bi-directional predicted slice

BMA block matching algorithm

CABAC context adaptive binary arithmetic coding

CAVLC context adaptive variable length coding

DMVE decoder motion vector estimation

FMO flexible macroblock ordering

GOP group of pictures

I-slice intra coding slice

IDR instantaneous decoder refresh

JM joint model

MB macroblock

MMS multimedia messaging services

MTU maximum transportation unit

MV motion vector

OBMA overlapped block matching algorithm

P-slice predictive-coded slice or inter-coded slice

PCS packet switched conversational service

POC picture order count

PSS packet switched streaming service

RBMA refined block matching algorithm Error Concealment Techniques in H.264/AVC for Video Transmission

Abstract:-

The main objective of the thesis is to implement the error concealment algorithm over a lossy packet networks. Video transmission over the wireless environment is challenging task with the high-compression efficiency and with a good design. Both of these have a major role in H.264 standard effort for real-time and streaming applications. The main task is to provide an overview of the techniques which are likely to be used in wireless environment and to get a satisfactory result and to improve the quality of the corrupted image.

Overview:-

The various video coding standards are being developed to satisfy the requirements of applications for various purposes to get a better picture quality, higher coding efficiency, and more error robustness .The new internationally proposed video coding standard H.264 [1] has a tremendous improvement in the picture quality, coding efficiency and even in the error robustness compared to the other standards which was proposed earlier. Here in this, I mainly focus on the error concealment of the new standard.

The noisy channel conditions like the wireless networks obstruct the perfect reception of coded video bit-stream in the decoder [1]. Incorrect decoding by the lost data degrades the subjective picture quality and propagates to the subsequent blocks or pictures. So, H.264 utilizes some methods to exploit error resilience to network noise.

Due to the high error rate in wireless environment, error resilience schemes are necessary for the wireless network, so that their usability and performance can be scaled by encoding with simple FMO (flexible macroblock ordering) [11] mode and extra intra block refreshing can achieve the best error correction ability, this process uses the spatial correlation for intra prediction and reduces the requirements of bit rates. By this scheme we can obtain the better error concealment results in the decoder.

In the real time applications like video conferencing and video telephony [9] there must be a fluent transmission but with a small noisy disturbance may cause the jerking transmission and will affect the quality of the video. To overcome this problem we are using the error concealment technique within the video codec in the decoder to recover the transmission quality without affecting their fluency. Introduction:-

The demand of the multimedia services through the wireless network is increasing. The current and the most advanced wireless networks contain a variety of packet oriented transmission modes allowing transport of video packets practically through the wireless network, thus allowing the users with a simple and flexible interface. In this emerging wireless network the cost of the transmission mainly depends on the volume of the transmitted data and also due to the limited transmission bandwidth and the transmission power. By considering the above constraints the compression efficiency is the main target for the wireless and video applications. This makes the H.264/AVC [1] coding a perfect match for all wireless applications.

Video transmission for the mobile wireless systems is the major application in this field. Some of the wireless applications include MMS (multimedia messaging service), PSS (packet switched streaming service), PCS (packet switched conversational service), video telephony, storage and broadcast. However to allow the transmission in the different environments not only the coding efficiency is relevant but also an easy integration of the coded video into all current and possible future protocols, for the conversational applications, the video codec’s support of enhanced error resilience feature is of major importance.

The three major categories in the H.264/AVC video transmission are identified in the process.

1. Circuit switched and packet switched conversational services for video telephony and conferencing. 2. Live or pre recorded video packet switched streaming service. 3. Video in multimedia messaging service.

Fig. 1: Wireless Video Applications MMS, PSS and PCS: differentiation by real-time or offline processing for encoding, transmission and decoding [1]

The transmission requirements for the three identical applications can be distinguished with respect to requested data rate, the maximum allowed end to end delay and the maximum jitter. In Fig. 1 encoding and decoding is performed simultaneously at the same time and in both directions .For the conversational services for video telephony and conferencing, the end to end delay has to be minimized to avoid any perceptual disturbances and to maintain synchronicity of audio and video [1]. In the pre-recorded video applications the user request for the pre-coded sequences, which are stored at the server. Here the encoding and transmission are separated, decoding and display are started during transmission to minimize the initial delay and memory usage in mobile devices. Finally in the MMS the encoding, transport and decoding are completely separated. The recorded video signal is offline encoded and locally stored. The transmission is started using the stored signal at any time. The decoding process at the receiver will not start until the completion of the download.

A video is a group of frames and it is coded by considering one frame at a time. A frame is considered as a group of slices. A frame can have one or more slices. A slice consists of a sequence of macroblocks (MB).

In H.264 the most error resilience means are implemented in the encoder, while decoder find the errors in the packets in which it is lost and tries to conceal and recover them back with various error concealment methods. Error concealment is the most important and it is used in the decoder. Slice structure with FMO is preferred in encoder it has two main comparison which is interleaving FMO mode and dispersive FMO mode.

The decoding becomes very error prone if we do not use any error resilience tools in the system, the main reason is that error propagation and error drift cannot be removed in this scheme. By using some of resilience schemes we can efficiently stop the error drift due to lost packets, and decoded PSNR value will get better [11].

Coding Profiles:

The decoder supports four main profiles in the video coding Baseline, Main, High and Extended [3].

1. Baseline Profile offers I/P frames, supports progressive and CAVLC only.

2. Extended Profile offers I/P/B/SP/SI frames, supports progressive and CAVLC only.

3. Main Profile offers I/P/B frames, supports progressive and interlaced, and offers CAVLC or CABAC.

4. High Profile adds to main profile 8x8 intra prediction, custom quantization, lossless video coding, more YUV formats (4:4:4). Fig 2: The specific coding parts of profiles in H.264 [3]

Baseline profile of H.264 is used in the thesis.

The profiles have some of the common coding parts

 I-Slice (intra-coded slice): Prediction only from decoded samples within the same slices and removes the spatial redundancies.

 P-Slice (predictive-coded slice or inter-coded slice):Inter-prediction from previously decoded reference pictures, using at most one motion vector and reference index to predict the sample values of each block and removes temporal redundancies.

 CAVLC (context-based adaptive variable length coding) for entropy coding Bit stream Output Video Input + Transform & Entropy Quantization Coding - Inverse Quantization & Inverse Transform

+ Intra/Inter Mode + Decision

Motion Intra Compensation Prediction

Picture Deblocking BufferingPicture Filter Buffering Motion Estimation (a) Encoder Bit stream Input Entropy Inverse Quantization + Deblocking Video Decoding & Inverse Transform Filter Output +

Intra/Inter Mode Picture Selection BufferingPicture Buffering Intra Prediction

Motion Compensation (b) Decoder

Fig 3: Encoder and decoder block diagram of H.264 [3]

The different coding algorithms are used for encoding and decoding of the single frame at a time. Fig. 3 shows the process of encoding and decoding a frame. Here Encoder may select between intra and inter coding for block-shaped regions of each frame. Intra coding can provide access points to the coded sequence where decoding can begin and continue correctly. Inter coding is more efficient using inter prediction of each block of sample values from some previously decoded frames. For encoding a macro-block in intra-coded mode, a prediction block is formed based on previously reconstructed blocks. The residual signal between the current block and the prediction is finally encoded. Algorithms:

The previous concealment technique fails to give a good PSNR results for the loss of macroblocks in a frame and a entire frame loss due to the unavailability of spatial neighbors. In H.264 the video frames can be divided into reference frames and non- reference frames. In case of non-reference frame loss [2] it cannot be detected in the reference software of H.264 (JM). So the decoder simply decodes the next available frame in the bitstream and skips the lost one. Hence the output video sequence has a fewer frames and this can cause jitter in the display. In case of reference frames, the decoder stops without further decoding.

Detection of the frame loss:

1. Reference frame loss detection

It is a simple and more efficient way to detect the frame loss in the decoder. When the frames are encoded, each encoded frame is assigned a value defined as frame_num and it is stored in the slice header. Here we consider the different GOP (typical group of pictures is shown in Fig. 4) in which the first frame as IDR frame with frame_num 0 and it is incremented by 1 for each coded frame. When it is to be decoded, its frame_num is parsed from the slice header and compared with the one of the previous frames. If the current slice is not an IDR frame, and a gap more than one is found between the two frame_num, then the frame loss is detected [2, 10].

Fig 4: GOP Layer- Intra (I), Predicted (P) and Bi-directional (B) predicted slices 2. IDR frame loss detection

For the frames within the same GOP, the frame_num of the frame currently being decoded is greater than the one of the previous frame. However, when an IDR frame is lost, the current frame_num is generally smaller than the previous one. So when a decrease in frame_num happens and the current frame_num does not equal to zero, an IDR frame is lost [2].

3. Non-reference frame loss detection:

As considered earlier the loss of the non reference frame in the decoder does not affect the normal decoding process. In order to detect a non-reference frame loss during the normal decoding process, the property of the decoder internal variable POC (picture order count) can be used. The decoder assigns a POC value to each coded frame, starting from 0 for an IDR frame in a GOP. It is mainly assigned for the source decoding, such as weighted prediction. Typically, the POC gap for the two temporally consecutive frames remains the same within a GOP. This property can be used to detect the frame loss. Since the possible lost frames have been detected and concealed based on frame_num prior to this stage, an unexpected POC gap thus indicates that a non-reference frame is lost [2, 10].

Error Concealment or recovering methods

The error concealment is the important feature in the H.264 test model. The main goal is to estimate the missing data and motion vectors. In particular, in I frames it is possible to have a lost macroblock surrounded by intact macroblocks that are used to interpolate the missing data. In P and B pictures, it is possible to have entire rows of macroblocks missing. In this case, spatial interpolation will not yield acceptable reconstructions. However the motion vectors of the surrounding regions can be used to estimate the lost vectors and the damaged region of the surrounding regions can be used to estimate the lost vectors, and the damaged region reconstructed with respect to motion compensated interpolation [8].

Two main concealment algorithms used in H.264 standard are, weighted pixel value averaging for intra pictures and boundary matching based motion vector recovery for inter pictures. Considering the first algorithm and assuming the macroblocks are not received, it is then concealed from the pixel values of spatially adjacent macroblocks. If a lost MB has at least two correctly decoded neighboring macroblocks, only these neighbors are used in the concealment process. Otherwise previously concealed neighboring MBs take part in the process. Each pixel value in a MB to be concealed is formed as a weighted sum of the closest boundary pixels of the selected adjacent MBs. The resultant performance of this intra concealment method is shown in Figs. 5 & 6. Fig. 5: Intra frame error concealment [1]

Now considering the second algorithm, here the motion activity of the correctly received slices of the current frame is investigated first. If the length of a motion vector component is smaller than a pre defined threshold, then all the lost slices are copied from the co-located positions in the reference frame. In other cases the motion compensated error concealment is used, and the motion vectors of the lost MBs are predicted. The performance of this algorithm is shown in Fig. 6

Fig. 6: Inter frame error concealment [1]

Error resilience scheme using data partitioning

It encodes the macroblock and it creates more than one bit string per slice. It assigns all the symbols of the slice into a individual partition that will have a close values with each other. In H.264 we differentiate the partitions into three types [12].

DPA: Header information (MB type, QP, MV) This is the most important, without it, symbols of the other partitions cannot be used. DPB: Intra data (coded block pattern, transform coefficients) It mainly requires the avaliabilty of DPA partition of a given slice. It carries coded block pattern and transform coefficients. This can stop further drift in the information.

DPC: Inter data (coded block pattern, transform coefficients) It mainly requires DPA partition. It carries coded block pattern and transform coefficients.

After transmitting the above data partitioned slices in an error prone wireless network and if the inter or intra partitions are missing, the available header information can still be used to improve the efficiency of error concealment [12, 13, 14]. Fig. 7 shows the data partitioning.

Figure 7a: Original image [13] Figure 7b: Data partitioned image [13]

Slice Structuring

While structuring a slice we have to mainly follow two steps

 Group of MBs which forms a slice whose size must be less than the size of MTU (maximum transportation unit)  Boundary MBs are coded without spatial correlation (intra prediction or MV prediction)

Error Resilience Scheme using FMO (Flexible Macroblock Ordering)

FMO is available in Baseline and Extended Profiles in H.264 but not in Main profile. Here it mainly allows assigning MBs to slices in an order other than the scan order, Each MB is statically assigned to a slice group using macroblock allocation map. By considering the Fig. 8 all the MBs of the frame are allocated either slice group 0 or slice group 1, which shown in different color. Taking the frame which is small enough such that it fits into the two slice groups. Transmitting the above format of the frame in an error prone wireless environment, during the transmission if the packet containing the information of slice group 0 or 1 got lost. Since every lost MB has several spatial neighbors that belong to the other slice, an error concealment mechanism has a lot of information by which it can recover the lost macroblock with the efficient concealment [12].

Fig 8: A picture with a size of 6*4 MBs and two slice groups [12].

FMO Pattern Types

>

Fig 9: FMO patterns

Existing Error Concealment Method

 Motion vector estimation: - The most frames in H.264 standard are predicted frames that have motion vectors associated with their macroblock by which they are reconstructed at the decoder [7]. A more efficient way to reconstruct a lost macroblock would be to estimate its associated missing motion vector.  Zero motion vector estimation: - Replace missing motion vector with zero magnitude and copy a macroblock from previously reconstructed reference slice at the exact same position where the motion vector is missing. Fig.10 shows the zero MV concealment in dispersed FMO slices.

Fig 10: Zero MV concealment in dispersed FMO slices

 Average motion vector: - The lost frames are predicted by averaging the motion vector of surrounding macroblocks and use the average motion vector to retrieve a version of the lost macroblock [5, 6]. This retrieved macroblock is then averaged with another version obtained with respect to spatial interpolation. The representation is given in Fig. 11.

Figure 11: Spatial concealment based on weighted pixel averaging [15]

 Spatial approach: - Here the approach is mainly based on using a ternary tree to classify the motion vectors neighboring the missing vector. The concept is mainly based on classifying each neighboring motion vector according to whether each of its components is positive, negative or zero. The idea is to implicitly model the discontinuity in the motion field and is similar to the approach considered in [4]. After all the neighboring motion vector have been classified we then determine the class of the motion vector belongs. This is done by assigning a value to each class. Finally the motion vector that provides the macroblock with the lower value is used to estimate the missing macroblock.  Block matching algorithm (BMA):- Under the assumption of a smooth transition between the boundary of the lost macroblock and its neighboring MBs, BMA selects the best motion vector to conceal lost MB from a set of candidate MVs that include the zero MV and all eight neighboring MVs as shown in Fig. 13. It is good for the video which has uniform movement. It mainly computes the distortion with horizontal and vertical gradient based edge detectors with simple difference (weak diagonal edges). Figure 12 shows the format of the BMA.

Fig 12: Block matching distortion calculation [16]

OBMA (outer block matching algorithm) which is also known as the decoder motion vector estimation (DMVE) method, computes motion absolute difference using the boundary of neighboring MBs and the outer boundary of the candidate MB as shown in Fig. 13. Both BMA and OBMA choose the candidate MV that has lowest mean absolute difference to conceal the lost MB.

Fig 13: Difference of distortion calculation [16] Refined BMA (RBMA):- In the simplest case of the RBMA, the damaged macroblock is divided into four blocks. Accordingly, the external boundary of a damaged MB is also divided into four vectors. Correspondingly to the top-left, top-right, bottom-left, and bottom-right blocks in the MB respectively. The motion vectors of the four blocks are estimated independently, but with some constraints in order to ensure the spatial coherence of estimated motion vectors in homogeneous regions.

Calculate temporal activity using neighboring blocks of missed block. Implement the BMA on each 8x8 sub-macroblock individually. This representation is shown in Fig. 14.

Fig 14a: BMA Fig 14b: OBMA Fig 14c: RBMA

Proposed error concealment method

Frame copy algorithm:-Firstly by considering the low motion video in which the lost frame or macroblocks are copied from the previous decoded reference frame. When concealing the reference frame the concealed frame is used for display, and is also placed into the reference picture buffer to be used for decoding the subsequent frames. By considering the error resilience method in the encoder, the slice will be having the more information, when a macroblock is lost, the neighboring MB will be having the more information and it can be recovered by the error concealment method.

For the comparison purpose of the reconstructed frame, PSNR values for the various error concealment schemes (existing and the proposed method) are to be calculated. The main purpose is to obtain the algorithm with least PSNR values and the better quality.

Conclusions and Research Work

The proposed algorithm will conceal the lost frames in the decoder and will minimize the impact of packet loss on the decoding process without incurring the high overhead, while protecting the amount of the information possible. In determining the quality of the reconstruction frames the PSNR and MSE metric have been utilized. The thesis aims at carrying out the process for recovery of lost frames and later error resilience method will be used in the encoder. References:

[1] T. Stockhammer, M. M. Hannuksela and T. Wiegand, “H.264/AVC in Wireless Environments”, IEEE Trans. Circuits and Systems for Video Technology, Vol. 13, pp. 657- 673, July 2003.

[2] S. K. Bandyopadhyay, Z. Wu, P. Pandit and J. M. Boyce, “An Error Concealment Scheme for Entire Frame Losses for H.264/AVC”, Proc. IEEE Sarnoff Symposium, Mar. 2006.

[3] Soon-kak Kwon, A. Tamhankar and K.R. Rao, ”Overview of H.264 / MPEG-4 Part 10”, J. Visual Communication and Image Representation, vol. 17, pp.186- 216, April 2006.

[4] J. Konrad and E. Dubois, “Bayesian Estimation of Motion Vector Field”, IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 14, pp. 910-926, Sept. 1992.

[5] M. Ghanbari and V. Seferidis, “Cell-Loss Concealment in ATM Video Codecs”, IEEE Trans. Circuits and Systems for Video Technology, vol. 3, pp. 238-247, June 1993.

[6] M. Wada, “Selective Recovery of Video Packet Loss using Error Concealment,” IEEE Journal on Selected Areas in Communication, vol. 7, pp. 807-814, June 1989.

[7] Video Coding Standards 6. MPEG-1. ISO/IEC 11172-2 (’93).

[8] P.Salama, N. Shroff and E. J. Delp, “Error Concealment in Encoded Video Streams”, Proc. IEEE ICIP, Vol. 1, pp. 9-12, 1995.

[9] H. Ha, C. Yim and Y.Y.Kim, “Packet Loss Resilience using Unequal Forward Error Correction Assignment for Video Transmission over Communication Networks,” ACM digital library on Computer Communications, vol. 30, pp. 3676-3689, Dec. 2007.

[10] Y. Chen1, K. Yu, J. Li and S. Li, “An Error Concealment Algorithm for Entire Frame Loss in Video Transmission”, Microsoft Research Asia, Picture Coding Symposium, Dec. 2004.

[11] L. Liu, S. Zhang, X. Ye and Y. Zhang, “Error Resilience Schemes of H.264/AVC for 3G Conversational Video”, Proc. IEEE CIT, pp. 657- 661, Sept. 2005.

[12] S. Wenger, “H.264/AVC over IP” IEEE Trans. Circuits and Systems for Video Technology, vol. 13, pp. 645-656, July 2003. [13] T. Aladrovic, M. Matic, and M. Kos, “An Error Resilience Scheme for Layered Video Coding” IEEE Int. Symposium of Industrial Electronics, June 2005.

[14] T. Wiegand, G. J. Sullivan, G. Bjontegaard, and A. Luthra “Overview of the H.264/AVC Video Coding Standard” IEEE Trans. Circuits and Systems for Video Technology, vol. 13, pp. 560-576, June 2003.

[15] S.Kumar, L.Xu , M. K. Mandal, and S. Panchanathan, ” Error resiliency schemes in H.264/AVC standard,” J. Visual Communication and Image Representation, vol. 17, pp. 425-450, April 2006.

[16] T. Thaipanich, P.-H. Wu, and C.C. J Kuo, “Low-Complexity Mobile Video Error Concealment Using OBMA”, IEEE Int. Conf. on Consumer Electronics, Jan 2008. Reference Websites:

[1] Reference for H.264 http://www.vcodex.com

[2] 12.1 and 13.2 JM H.264 software http://iphome.hhi.de/suehring/tml/

[3] Free H.264 software www.videolan.com

[4] link for reference journal [2] https://ipbt.hhi.de/mantis/file_download.php? file_id=4&type=bug

[5] link for reference file [7] http://iphome.hhi.de/wiegand/assets/pdfs/DIC_video_coding_standards_07.pdf

[6] link for reference journal [10] www.ece.ucdavis.edu/PCS2004/pdf/ID99_ error _ concealment _for_ entire _ frame _ loss 1. pdf -

[7] link for reference journal [16] www.viola.usc.edu/Research/Tanaphol_paper_2.pdf