<<

Implementation and Quality Evaluation of Video Telephony Using Session Initiation Protocol

Taweesak Samanchuen and Supaporn Kiattisin Technology of Information System Management Division, Faculty of Engineering Mahidol University, Puttamonthon, Nakorn Pathom 73170, Thailand E-mail: t.samanchuen@.com, Phone: (662) 889-2138 ext. 6300

Abstract—Video telephony is a big change of communication technologies and becomes a standard-based technology available Video Telephony Server on personal computers and mobile phones for today. Video telephony technology has been developed using several advanced technologies which can be simplified into two fundamental parts, i.e., server and client. The server enables the clients to keep connect to each other while the client captures the audio and video data for transmission and converts the transmission of data back to the form of audio and video signals. For our implemented systems, Session Initiation Protocol (SIP) server is used as video telephony server. Video such as VP8, MPEG4, H263, and H264 are used as video compression in the client. The performances of codec are evaluated by objective and subjective quality tests. The result shows that the implemented system can work properly and the video quality of each codec is Internet also evaluated.

I.INTRODUCTION Client A Client B The internet makes people connect together via various Fig. 1: Simple block diagram of the video telephony system. communication channels such as e-mail, chat programs, social networking, voice over IP (VoIP), or video telephony system. Video telephony system is a communication channel which can serve for both personal and official purposes. This com- become a majority in telephony area because of its text-based munication channel makes users to communicate in real time protocols architecture which can access to new services faster and feel like they are sitting in front of each other. The popular with a greater flexibility than many alternative multimedia free-service video telephony systems for today are “Skpye”, communication protocols in use today [2], [3]. For our study, acquired by Microsoft Corporation and “Facetime”, invented SIP server is set to be the video telephony server. by Apple Inc. As these two service systems are free of charge, Another important part of video telephony is video com- users must be requested to sign in as their members and will be pression which can be split into two processes; coding and required to connect to their servers over the internet. However, decoding processes. At present, there are a lot of compression setting up a private video telephony system or within LAN or standards to use for coding video and audio available. The WAN for the company work group cannot be done according very first compression standards launched in 1993 were MPEG to the limitation of internet connection requirements. 1 and MPEG 2 which were developed by Moving Picture The video telephony system uses several advance technolo- Experts Group (MPEG). In 1998, Video Coding Experts gies. In Fig. 1, the simple block diagram of the video telephony Group (VCEG) was responsible for the H.261 which was system, which consists of video telephony server and clients, widely used for video conferencing and later, becoming the is shown. The clients can be any kinds of smart devices such as H.263 successor. These two groups set up the joint video personal computer (PC), smart phone, or tablet with a camera, team collaboration and prepared international standard for a microphone, a display, and a speaker. The server functions as H.264/MPEG4 Part 10 (AVC). The client registration and call initiation, while the client functions overview of technical features of H.264/AVC can be found in as end device, i.e., initialing the call request, voice and video [4] and [5]. In May 2010, started a new open media capturing, and digital signal processing. project called “WebM”, which is dedicated to develop open The video telephony server can be done by several pro- media format for the web that is available for everyone. At the tocols such as H.323 protocol, Mobile Status Notification core of the project is a new open source video compression Protocol (MSNP), Extensible Messaging and Presence Pro- format VP8. This format was originally developed by a small tocol (XMPP), or Session Initiation Protocol (SIP). SIP has research team at , Inc. as a successor of

978-616-361-823-8 © 2014 APSIPA APSIPA 2014 its VPX family of video . Compared to other video Server coding formats, VP8 has many distinctive technical features SIP Server that help it to achieve high compression efficiency and low Linux OS computational complexity for decoding at the same time. The objective of our work is to implement the video telephony system for wireless network and evaluate the per- Softphone Access formance of audio and techniques in the video Microsoft Point telephony environment. The organization of this paper is as Window OS follows: Section II, explain detail of how to evaluate the per- Softphone ` formance of the video telephony. Section III describes about IOS Smart Phone Router system implementation. Experimental result and conclusion PC are in Sections IV and V, respectively. Fig. 2: Simple block diagram of the implemented system. II.VIDEO TELEPHONY PERFORMANCE EVALUATION Video telephony consists of two parts of perceived signals, TABLE I: Hardware specification i.e., audio and video. Evaluating of the overall performance of Server CPU Intel icore 3 2.5 GHz both signals must be considered simultaneously. As described RAM 4 GB in [6], we can classify the quality measurement of video Client - PC CPU Intel Atom N270 1.667 GHz RAM 4 GB telephony into two groups, i.e., subjective quality and objective Client - Smart Phone CPU ARM Cortex-A8 1 GHz quality. RAM 512MB A. Subjective Quality The subjective quality means the quality of signal perceived 2 th by people which is measured by using an assessment. For where 255 is maximum power, u(k, m, n) is pixel n th th video quality subjective experiment, a number of people are of the row m in the original image at the time index k , asked to watch a set of video clips and rate their quality. and uˆ(k, m, n) is corresponding pixel in the reproduced image The average of overall rating from viewers to a given clip [11]. is also known as the Mean Opinion Score (MOS). Currently, Because PSNR is based on byte-by-byte comparison of the there are a lot of subjective testing methods. International data without considering what video actually represents, so Telecommunication Union (ITU) has published in various it is only has an approximate relationship with the video recommendations [7]-[9].They recommend standard viewing quality. To overcome this problem, the approach in metric conditions, criteria for selection of observer and test material, designs were proposed [12],[13] which can be classified into assessment procedure, and data analysis methods. The rec- two groups i.e., vision modeling approach and engineering ommendations include implicit comparisons such as Double approach which can be found more detail in [13]. However, stimulus Continuous Quality Scale (DSCQS), explicit com- for simplifying the test, PSNR is used in this work. parisons such as Double Stimulus Impairment Scale (DSIS). III.SYSTEM IMPLEMENTATION More detail in subjective test can be found in [10]. As shown in Fig. 1, there are two parts of the system to Subjective test is very important for multimedia quality implement, client and server. For the client part, PCs and smart testing. The main weak point is the requirement to use a large phones are used as clients while the server is implemented on number of testers, which may limit number of video material a PC. The diagram of the implemented system is shown in that can be rated in a reasonable amount of time. However, Fig. 2 with hardware specification shown in Table I. subjective test remains the benchmark for all objective quality A variety of open source projects are selected and applied matrices. to each part of the system. On the server side, the open source B. Objective Quality project called “Asterisk” [14] is used as the SIP server. On the A well-known objective quality for video processing is client side, the open source project called “Linphone” [15] is called peak signal-to-noise ratio (PSNR). There are many used as the video softphone. Linphone can be run on different reasons why the PSNR is popular for objective quality but kinds of platforms such as PC, tablet, or smart phone which its most outstanding feature is that the formula used for is suitable for wireless network system. To make the SIP computing is very simple to understand and implement. PSNR server function correctly and successfully enable the video is defined for a video sequence with original M × N image call, the configuration of the server needs to be set properly. u and the total number of frames K as the ratio of the total The configuration on our server is shown as follows: maximum (peak) pixel power to the total mean squared error // sip.conf: (MSE), as follows: [general] 2552 port=5060 PSNR = 10 log , (1) tcpenable=yes 10 P (u(k,m,n)−uˆ(k,m,n))2 k,m,n KMN tcpbindaddr=0.0.0.0 SIP Server

Video Call Video Call SIP client SIP client (Transmitter) (Receive)

Network Emulator

Iperf client Iperf server

Fig. 3: Network testbed for measurement. bindaddr=0.0.0.0 allowoverlap=no videosupport=yes allow=h263 Fig. 4: Example image from the implemented video telephony allow=h264 system. allow=VP8 allow=MPEG4 context=defualt TABLE II: Test condition [2000] Video resolution CIF (386x288) type=friend Frame rate per second 5 - 30 fps secret=1234 Video bits rat 100kbps - 1000kbps host=dynamic Codec H263, H264, MPEG4, VP8 [2001] Video sequences Foreman, Aykiyo type=friend Package loss rate 0 - 0.05 secret=1234 host=dynamic

The given configuration shows that each audio and video measure factor for this test, the smart phone was replaced encoding needs to enable before using. The last well-known with a PC with the same specification as shown in Table I. video codecs are used for comparison, i.e., VP8, MPEG4, Then the test condition is set as Table II. H263, and H264. PCs and smart phones are used as clients. PSNR is used to investigate the impact of package loss rate For performance evaluation purposed, the transmitted video and bandwidth on video quality. The result is shown in Fig. 5. content must be consistent and repeatable which can be done From the result, we can see that, at the same package loss rate, by using virtual video camera tool [16] to inject the video H264 provides the best PSNR, when compared with H263, sequences. The diagram of the testing system is shown in Fig. MPEG4, and VP8. The bandwidths of these tests are 1 Mbps 3 and the environment of the testing systems is shown in Table on both uplink and downlink. II. From Fig. 3, we can see that all software components are The second test is performed by controlling the bandwidth connected via network emulator [17] which can control all of network. The bandwidth is varied from 200 kbps to 1000 possible network variables such as bandwidth, package loss kbps while the other parameters still the same. The test results rate, or latency. We also control the background traffic using of each codec are shown in Fig. 6. We can see that H264 “Iperf” [18]. The video sequences “Foreman” and “Aykiyo” provides the best PSNR when compared with H263, MPEG4, are injected to transmitter and receiver, respectively. Example and VP8 at the same bandwidth of the network. images of the transmitter side are shown in Fig. 4 which The last test is subjective quality assessment. We perform consists of two parts on each image, self-view part and main- subjective quality assessment by following ITU-T Recom- view part. However, to evaluate the received video quality, mendations [8] and [9]. The total of 8 subjects (5 males, 3 only the main-view is needed while the self-view is disabled. females) participated in this test with the ages ranged from 20 to 36 years. The test videos are viewed once at a time and IV. EXPERIMENTAL RESULT rated independently on a discrete 5-level scale from “bad” to From the implemented system diagram in Fig. 2, we can see “excellent”. The test is run on the given condition in Table that clients are PC and smart phone. When testing with smart II by varying video coding. The result of the test is shown phone, the system works properly but we cannot measure the in Fig. 7. Form the result we can see that H264 provides the objective quality because there is no measuring tool available best MOS when compared with other video coding on the on smart phone. Because the processing capability is not same given condition. 45 H264 H264 H263 H263 MPEG4 MPEG4 40 VP8 4 VP8

35 MOS

PSNR (dB) 30 3

25

20 2 0 0.01 0.02 0.03 0.04 0.05 0 0.01 0.02 0.03 0.04 0.05 Package loss rate Package loss rate Fig. 5: Effect of package loss over video quality. Fig. 7: MOS of various video codec under the test condition in Table II.

H264 H263 MPEG4 [3] M. Cortes, J.R. Ensor, and J. O. Esteban, “On SIP Performance,” Bell Labs Technical Journal, vol. 9, no. 3, pp.155-172, Nov. 2004. 40 VP8 [4] MPEG 4 part 10 AVC(H.264) Video Encoding, Scientific Atlanta, Jun. 2005. [5] R. Schafer, T. Wiegand and H. Schwarz, “The Emerging H.264/AVC Standard Heinrich Hertz Institute, Berlin, Germany. [6] S. Winkler and P. Mohandas, “The Evolution of Video Quality Measure- ment: From PSNR to Hybrid Metrics,” IEEE Trans Broadcasting, vol. 54, no. 3, pp. 660-668, Sep. 2008. PSNR (dB) [7] ITU-T Rec. P.800., Methods of Subjective Determination of Transmission Quality., International Telecommunication Union, Aug. 1996. 35 [8] ITU-T Rec. P.910., Subjective video quality assessment methods for multimedia applications., International Telecommunication Union, Dec. 1998. [9] ITU-T Rec. P.911. Subjective audiovisual quality assessment methods for multimedia applications., International Telecommunication Union, Dec. 1998. 200 400 600 800 1000 [10] H. R. Wu and K. R. Rao,,“Video quality testing,” Digital Video Image Bandwidth (kbps) Quality and Perceptual Coding, Eds. CRC Press, 2006, ch. 4. Fig. 6: Effect of bandwidth over video quality. [11] C. Dubuc, D. Boudreau, and F. Patenaude, “The Design and Simulated Performance of a Mobile Video Telephony Application for Satellite Third- Generation Wireless Systems,” IEEE Trans. Multimedia., vol. 3, no. 4, pp. 424-431, Dec. 2001. V. CONCLUSION [12] J. L. Mannos and D. J. Sakrison,“The effects of a visual fidelity criterion on the encoding of images, IEEE Trans. Inf. Theory, vol. 20, no. 4, pp. This work shows that the video telephony can be imple- 525-536, Jul. 1974. mented by using SIP server with free softphone. The video [13] S. Winkler,“Perceptual video quality metrics a review, in Digita Video quality is evaluated by using subjective and objective quality Image Quality and Perceptual Coding, H. R. Wu and K. R. Rao, Eds. measurements. We can conclude that H264 provides the best CRC Press, 2005, ch. 5. [14] Digium, Inc. “Asterisk,” [Online]. Available: http://www.asterisk.org. video quality when compared with H263, MPEG4, and VP8 [15] Belledonne Communications, “Linphone,” [Online]. Available: http:// at the same condition by using the objective test with the www.linphone.org/eng/download/. confirmation for the subject quality assessment. For future [16] e2eSoft, “Vcam:Webcam emulator.” [Online]. Available: http:// work, the computation power needs to evaluate especially for www.e2esoft.cn/vcam/. [17] Microsoft Research Asia,“Network Emulator for Windows Toolkit handheld devices such as smart phones or tablets. (NEWT).” [Online]. Available: http://blogs.msdn.com/b/lkruger. [18] A. Tirumala, F. Qin, J. Dugan, J. Ferguson, andK.Gibbs, Iperf. [Online]. REFERENCES Available: http://dast.nlanr.net/Projects/Iperf/. [1] Y. Xu, C. Yu, J. Li, and Y. Liu, “Video Telephony for End-Consumers: Measurement Study of Google+, iChat, and ,” IEEE/ACM Trans. Networking, a future issue of this journal, accepted Apr. 07, 2013. [2] J. Rosenberg, H. Schulzrinne, et. al. “SIP:Session Initiation Protocol,” RFC 3261, Jun. 2002.