Algorithm for Detecting Video and a New Design for Handling Video Applications in Thin-Client Systems at Low Bit Rates

Umendar Koosam and Devendra Jalihal Thin-Client Lab, Department of Electrical Engineering Indian Institute of Technology Madras, Chennai 600036, India Email: - [email protected], [email protected]

Abstract — The existing thin-clients are well suited for office appli- load will be reduced avoiding video decoding and VNC encod- cations. When it comes to multimedia and gaming applications the ing. The video, flash and browser (for embedded video) appli- performance degrades drastically in terms of frame rate and cations have to be modified. The client should have decoders bandwidth utilization. The encoding techniques available at the for all the video file formats corresponding to the video files at thick server do not consider the temporal redundancy between the the server . frame updates which is a key feature in video applications. In this paper, we propose a new method in handling video applications B. Video Re-encoding by using a video codec. An algorithm for video window detection at virtual frame buffer level is developed. From the experimental If the video window in the desktop screen area is known to the VNC server, video codec can be used instead of the present results, we compare this method with other classic encoding VNC encoding. The client will have the decoder corresponding schemes in terms of bandwidth utilization and frame rate. to the codec used at the server. There are two methods in han- dling the video window location in desktop screen area. Both I. I NTRODUCTION of these methods require less bandwidth and more CPU re- sources at the server than the present VNC encoding. Both may The present versions of encoding techniques in RFB (Re- lead to degradation in video quality because of video re- mote Frame Buffer) protocol (also known as Virtual Network encoding. Computing) [1] in thin-client systems are rectangle based en- coding techniques. These techniques exploit spatial redundancy Application level Detection: The video window location can be only, because they encode each frame independently. They do known from the applications. This method requires modifica- not exploit temporal redundancy (other than frame differenc- tions to the video and browser applications. ing). Color rich areas are handled with image compression technique like JPEG in best possible Tight Encoding [2], but Frame Buffer level Detection: The presence of video can be this is not a correct solution for video applications. They do not known from the virtual frame buffer. In this method video and achieve bandwidth limitations for video applications at slow browser applications need not be modified. network connections. This motivates the study of alternate so- lutions and implementations for handling video at low bit rates. The above two methods work for all video file formats. Cli- ent does not need to have decoders for all files formats. How- A number of solutions have been proposed that essentially ever both the methods can also be used for flash and gaming include some form of video compression. We discuss below applications. two basic approaches. D. De Winter et al. [3] used video codec H.264 to stream the A. Direct Streaming graphical output of applications after GPU (Graphical Process- ing Unit) processing to a thin-client device. The complexity of Streaming the video file directly to client avoids decoding graphical commands or amount of motion in the desktop screen of the video by the host CPU and re-encoding by the VNC is used to switch between real time video codec and VNC en- server. At the client end, VNC viewer should be capable of coding. However, performing motion detection and encoding decoding the video and displaying it. This method uses less of entire desktop screen area (even for small video sizes) will bandwidth than the present VNC encoding. The server CPU not be an optimal solution .

Firstly, we define the video window as a rectangular portion which is very dense in color and gets updated for continuous frame updates with same and are correlated im- ages. The rectangles with w < Wmin and h < Hmin are filtered out

as non-video rectangles. We set Wmin = 176 and Hmin = 144 .

The steps involved in this video window detection which are described in detail below are: First, the image is checked for picture or natural image type. Second, the temporal correlation is measured between two picture images of present and previ- ous frame updates and finally the motion due to scrolling op- erations is checked between the correlated images. Once the video window detection is done for a span of ‘k’ frames, the video_flag (VF) is set and thereafter video window detection algorithm is bypassed for that video window < x, y, w, h >. We assume only one video portion is present in the screen area .

The video window detection algorithm processes the frame buffer data in RGB color domain without converting to gray Figure 1: Proposed method for handling video portion for scale. This saves lot of computational power in real time. thin-client system.

The solutions which require modification to the applications are not practical for proprietary video players such as Real Player, Quick Time Player and etc. Modifications for the open source players such as xine, , flash etc. are possible. However, modifications for every new open source player and new versions of them are impractical from maintenance point of view. Considering these facts we propose to use a variant of H.263 video codec [4] for video portion because of its lower complexity as shown in Fig- ure 1.

A frame buffer update is sent from server to the client upon a request. The applications write the data into the virtual frame buffer at the thick server and the data is represented in terms of rectangles. The video portion is detected from the virtual frame buffer content and is encoded using a video codec. The remain- ing rectangles (other than video portion) of the frame buffer are encoded by normal Tight Encoding.

The remainder of this paper is organized as follows. Section 2, describes the new video window detection algorithm. Sec- tion 3 presents the experimental results followed by conclusion in section 4.

II. V IDEO WINDOW DETECTION ALGORITHM

The present frame update is a difference of previous and present screen information and is represented in terms of rec- tangles < xi, yi, wi, hi> where i =1, 2…N and xi, yi denotes the position of the rectangle in the screen area and wi, hi denotes the width and height of the rectangle. The algorithm for detect- ing video window in frame buffer data is shown by means of a Figure 2: Video window detection algorithm flow chart in Figure 2.

A. Detection of picture/natural image Table 1: Average percentage of picture blocks and non- picture blocks in various images Lin T. et al. proposed color count based classification of text blocks for compound image compression [5]. We propose a mean RGB based classification of picture and non-picture Block type Text Pages Web Pages Video blocks. The steps are given below. Images

Step1: The image is divided into non-overlapping blocks of Picture 11 18 92 size MxM. We set M=12. Blocks

Step2: Each 12x12 block is sub divided into 4x4 blocks of count 9 as shown below Non Picture . 89 82 8 Blocks 4x4 4x4 4x4

4x4 4x4 4x4 Step4: If R1i, G1 i, B1 i and R2 i , G2 i, B2 i are the mean R, G, Bs of all blocks of two images respectively 4x4 4x4 4x4

12x12 12x12 Two corresponding NxN blocks are said to be correlated if the following conditions are satisfied: Step3: Mean R, G, Bs of each sub-block are computed as: R1 i ~ R2 i < Th Mr = Σ Σ R (i, j) G1 i ~ G2 i < Th Th Mg = Σ Σ G (i, j) B1 i ~ B2i < (2) Mb = Σ Σ B (i, j) i, j = 1...16. (1) Th - Threshold is set to 25 for N=16. Figure 3 shows Probabil- R, G and B are the color component matrices of the image. ity of miss and Probability of false alarm curves for different thresholds. Step4: The 12x12 block is classified as non-picture if the abso- lute differences of Mr, Mg, Mb of any adjacent (horizontal or Step5: If the number of correlated blocks is greater than uncor- vertical) 4x4 blocks is zero otherwise picture block. related blocks then the two images are decided as temporally correlated images. Step5: If the number of picture blocks is greater than a thresh- old T then the image is decided as picture/natural image. We In this step, series of picture images (e.g. JPEG, BMP) and pict slide show of complex text documents which are uncorrelated set Tpict = 80% of total blocks. are classified as non video rectangles. Most of the documents handled in a typical computer usage are text pages, web pages, picture images and video files. It is also observed that the picture/natural images will not have same mean R, G, and B values between two adjacent 4x4 blocks. Experiments conducted on the images of each category using the above classification algorithm and the results are shown in Table 1. In this step, text pages and web pages are separated from the group of color rich picture images and video frames.

B. Correlation measurement between two picture images

Step1: The two images are divided into non-overlapping blocks of size NxN.

Step2: Mean R, G, Bs of each block are computed.

Step3: The absolute differences of mean R, G, and Bs of the corresponding blocks in the two images are computed. (a)

Step3: MAD is computed as follows for the 3 color planes:

MADr = (1/size) * Σ Σ ( Ri (x, y) - R j (x, y)) MADg = (1/size) * Σ Σ ( Gi (x, y) - G j (x, y)) MADb = (1/size) * Σ Σ ( Bi (x, y) - B j (x, y)) (3)

Ri, Gi, Bi and Rj, Gj, Bj are the color component matrices for the two images and size = 16*cols.

Step4: For vertical scrolling operations in the complex text documents and images MAD is zero for all the r, g, and b planes whereas for video frames the MAD will be non-zero.

Step5: Similarly for horizontal scrolling, a vertical strip of size rows×16 (rows = total number of rows of the image) consid- ered in centre of the image and MAD matching is used.

Video window detection algorithm incorporated in the pro-

(b) posed solution can detect the individually played video files, Figure 3: Probability of Miss (a) and Probability of False video embedded in browser and flash videos very well. The alarm (b) for the detection of correlated images for differ- text pages, web pages and picture images are classified as non ent thresholds. video rectangles.

The proposed mean RGB based classification of picture and III. E XPERIMENTAL RESULTS non-picture blocks is computationally more efficient than the classification based on color count. The color count method The test machines used are P4 (2.4GHz, 512MB RAM) and involves searching and comparisons where as mean RGB P4 (1.3GHz, 256MB RAM) with 2.4.20-8 kernel on Red method involves additions and comparisons. Additionally, ap- Hat Linux 9 for server and client respectively. The Tight VNC plication of color count method for every frame update makes 1.2.9 UNIX version [6] along with video codec H.263 is used it inefficient in a real time scenario. However, mean RGB in the test VNC system. The video codec is set with key frame method saves lot of computational power by reusing the mean interval 15 and quantization step size 13 . RGB values calculated in the first step of the algorithm in the second step also. Experiments are conducted for LAN (100 Mbps), 512 kbps, . Detection of motion due to scrolling 256 kbps and 128 kbps network connections. Table 2 shows the number of frame updates per second for the new proposed method and Tight VNC at different bandwidths. The network The above steps of the algorithm may misclassify text bandwidth utilization is evaluated for the new scheme and document with complex color background as a picture image. compared with the previous encoding schemes under 100 Mbps There would be correlation between two consecutive frame LAN is shown in Table 3. The test results shown are for a updates in text document handling such as scrolling and hence video clip of size 320x240 played at 30 fps using mplayer. is treated as video frame sequence. But the movement in these scrolling operations is strictly unidirectional (horizontal or ver- Table 2: Number of frame updates per second at different tical) with equidistance motion for all the pixels whereas mo- bandwidths tion in video frames is not predicted. Using this clue we avoid the misclassification of scrolling operations of complex text documents and natural images as video frames. 100 512 256 128 Encoding Mbps kbps kbps kbps We used simple searching mechanism:

Step1: For vertical scrolling, a horizontal strip of size 16×cols Tight Encoder 20–21 5 3 1 (cols = total number of columns of the image) is considered in the centre of the image.

Proposed (Tight Step2: The Mean Absolute Difference (MAD) searching is 22–23 16– 18 11 – 12 5 – 6 with video used to match this strip in the reference image within a search range of 16 rows above and below the strip position. codec)

Table 3: Bandwidth utilization for various encoding frame updates per second and network bandwidth utilized are schemes under LAN 100 Mbps highly improved compared to the previous VNC encoding schemes.

Encoding Type Network bandwidth IV. C ONCLUSION usage We developed a new architecture for handling video portion

of the desktop screen area using H.263 video codec for thin- Raw 11.2 MB/sec client systems. A low complexity video window detection al-

gorithm is carefully designed and implemented for the real Hex tile 4.9 MB/sec time application. This algorithm has been tested under different situations. It correctly classified still images, complex colored Zlib 2.8 MB/sec text pages and scrolling pictures as non video rectangles. Fur- ther the video portions are correctly classified as video win- Tight 226 KB/sec dows. We compared this method with the previous VNC en- coding schemes for frame rate and network bandwidth utiliza- Tight with video codec 48 KB/sec tion. The proposed method outperforms the classic VNC en-

codings and highly improved at low bandwidth connections. The number of frame updates achieved using video codec + tight encoding is 18 fps where as with tight encoding is only 5 REFERENCES fps and the round trip delay encountered is 50-60 milliseconds [1] Tristan Richardson, “The RFB Protocol ,” Version 3.8, October 2006. at 512 kbps bandwidth. Network bandwidth utilized is reduced [2] Kaplinsky K.V., “VNC tight encoder - data compression for VNC ,” pp. by more than 70% for the proposed method compared to VNC 155-157, MTT 2001, Proceedings of 7 th International Scientific and Practi- tight encoding. We used the base line H.263 video codec, a low cal Conference of Students, Post-graduates and Young Scientists. complexity video codec compared to H.264 which is used in [3] D. De Winter et al, “ A hybrid thin-client protocol for multimedia streaming and interactive gaming applications ,” NOSSDAV, May 2006. [3]. H.264 gives high coding gain and quality at the cost high [4] ITU Telecom. Standardization Sector of ITU, “Video coding for low bit- computational resources and will not be suitable for encoding rate communication ,” ITU-T Recommendation H.263, March 1996. desktop screen for VNC at low bit rates. [5] Lin T., Pengwei Hao, “Compound image compression for real-time com- puter screen image transmission,” IEEE Trans Image Processing, Volume 14, pp. 993 – 1005, August 2005. It is apparent from the experimental results that the video [6] http://www.tightvnc.com/ rich portions are best handled by video codec. The number of