Implementation and Quality Evaluation of Video Telephony Using Session Initiation Protocol
Total Page:16
File Type:pdf, Size:1020Kb
Implementation and Quality Evaluation of Video Telephony Using Session Initiation Protocol Taweesak Samanchuen and Supaporn Kiattisin Technology of Information System Management Division, Faculty of Engineering Mahidol University, Puttamonthon, Nakorn Pathom 73170, Thailand E-mail: [email protected], Phone: (662) 889-2138 ext. 6300 Abstract—Video telephony is a big change of communication technologies and becomes a standard-based technology available Video Telephony Server on personal computers and mobile phones for today. Video telephony technology has been developed using several advanced technologies which can be simplified into two fundamental parts, i.e., server and client. The server enables the clients to keep connect to each other while the client captures the audio and video data for transmission and converts the transmission of data back to the form of audio and video signals. For our implemented systems, Session Initiation Protocol (SIP) server is used as video telephony server. Video codec such as VP8, MPEG4, H263, and H264 are used as video compression in the client. The performances of codec are evaluated by objective and subjective quality tests. The result shows that the implemented system can work properly and the video quality of each codec is Internet also evaluated. I. INTRODUCTION Client A Client B The internet makes people connect together via various Fig. 1: Simple block diagram of the video telephony system. communication channels such as e-mail, chat programs, social networking, voice over IP (VoIP), or video telephony system. Video telephony system is a communication channel which can serve for both personal and official purposes. This com- become a majority in telephony area because of its text-based munication channel makes users to communicate in real time protocols architecture which can access to new services faster and feel like they are sitting in front of each other. The popular with a greater flexibility than many alternative multimedia free-service video telephony systems for today are “Skpye”, communication protocols in use today [2], [3]. For our study, acquired by Microsoft Corporation and “Facetime”, invented SIP server is set to be the video telephony server. by Apple Inc. As these two service systems are free of charge, Another important part of video telephony is video com- users must be requested to sign in as their members and will be pression which can be split into two processes; coding and required to connect to their servers over the internet. However, decoding processes. At present, there are a lot of compression setting up a private video telephony system or within LAN or standards to use for coding video and audio available. The WAN for the company work group cannot be done according very first compression standards launched in 1993 were MPEG to the limitation of internet connection requirements. 1 and MPEG 2 which were developed by Moving Picture The video telephony system uses several advance technolo- Experts Group (MPEG). In 1998, Video Coding Experts gies. In Fig. 1, the simple block diagram of the video telephony Group (VCEG) was responsible for the H.261 which was system, which consists of video telephony server and clients, widely used for video conferencing and later, becoming the is shown. The clients can be any kinds of smart devices such as H.263 successor. These two groups set up the joint video personal computer (PC), smart phone, or tablet with a camera, team collaboration and prepared international standard for a microphone, a display, and a speaker. The server functions as H.264/MPEG4 Part 10 Advanced Video Coding (AVC). The client registration and call initiation, while the client functions overview of technical features of H.264/AVC can be found in as end device, i.e., initialing the call request, voice and video [4] and [5]. In May 2010, Google started a new open media capturing, and digital signal processing. project called “WebM”, which is dedicated to develop open The video telephony server can be done by several pro- media format for the web that is available for everyone. At the tocols such as H.323 protocol, Mobile Status Notification core of the project is a new open source video compression Protocol (MSNP), Extensible Messaging and Presence Pro- format VP8. This format was originally developed by a small tocol (XMPP), or Session Initiation Protocol (SIP). SIP has research team at On2 Technologies, Inc. as a successor of 978-616-361-823-8 © 2014 APSIPA APSIPA 2014 its VPX family of video codecs. Compared to other video Server coding formats, VP8 has many distinctive technical features SIP Server that help it to achieve high compression efficiency and low Linux OS computational complexity for decoding at the same time. The objective of our work is to implement the video telephony system for wireless network and evaluate the per- Softphone Access formance of audio and video codec techniques in the video Microsoft Point telephony environment. The organization of this paper is as Window OS follows: Section II, explain detail of how to evaluate the per- Softphone ` formance of the video telephony. Section III describes about IOS Smart Phone Router system implementation. Experimental result and conclusion PC are in Sections IV and V, respectively. Fig. 2: Simple block diagram of the implemented system. II. VIDEO TELEPHONY PERFORMANCE EVALUATION Video telephony consists of two parts of perceived signals, TABLE I: Hardware specification i.e., audio and video. Evaluating of the overall performance of Server CPU Intel icore 3 2.5 GHz both signals must be considered simultaneously. As described RAM 4 GB in [6], we can classify the quality measurement of video Client - PC CPU Intel Atom N270 1.667 GHz RAM 4 GB telephony into two groups, i.e., subjective quality and objective Client - Smart Phone CPU ARM Cortex-A8 1 GHz quality. RAM 512MB A. Subjective Quality The subjective quality means the quality of signal perceived 2 th by people which is measured by using an assessment. For where 255 is maximum pixel power, u(k; m; n) is pixel n th th video quality subjective experiment, a number of people are of the row m in the original image at the time index k , asked to watch a set of video clips and rate their quality. and u^(k; m; n) is corresponding pixel in the reproduced image The average of overall rating from viewers to a given clip [11]. is also known as the Mean Opinion Score (MOS). Currently, Because PSNR is based on byte-by-byte comparison of the there are a lot of subjective testing methods. International data without considering what video actually represents, so Telecommunication Union (ITU) has published in various it is only has an approximate relationship with the video recommendations [7]-[9].They recommend standard viewing quality. To overcome this problem, the approach in metric conditions, criteria for selection of observer and test material, designs were proposed [12],[13] which can be classified into assessment procedure, and data analysis methods. The rec- two groups i.e., vision modeling approach and engineering ommendations include implicit comparisons such as Double approach which can be found more detail in [13]. However, stimulus Continuous Quality Scale (DSCQS), explicit com- for simplifying the test, PSNR is used in this work. parisons such as Double Stimulus Impairment Scale (DSIS). III. SYSTEM IMPLEMENTATION More detail in subjective test can be found in [10]. As shown in Fig. 1, there are two parts of the system to Subjective test is very important for multimedia quality implement, client and server. For the client part, PCs and smart testing. The main weak point is the requirement to use a large phones are used as clients while the server is implemented on number of testers, which may limit number of video material a PC. The diagram of the implemented system is shown in that can be rated in a reasonable amount of time. However, Fig. 2 with hardware specification shown in Table I. subjective test remains the benchmark for all objective quality A variety of open source projects are selected and applied matrices. to each part of the system. On the server side, the open source B. Objective Quality project called “Asterisk” [14] is used as the SIP server. On the A well-known objective quality for video processing is client side, the open source project called “Linphone” [15] is called peak signal-to-noise ratio (PSNR). There are many used as the video softphone. Linphone can be run on different reasons why the PSNR is popular for objective quality but kinds of platforms such as PC, tablet, or smart phone which its most outstanding feature is that the formula used for is suitable for wireless network system. To make the SIP computing is very simple to understand and implement. PSNR server function correctly and successfully enable the video is defined for a video sequence with original M × N image call, the configuration of the server needs to be set properly. u and the total number of frames K as the ratio of the total The configuration on our server is shown as follows: maximum (peak) pixel power to the total mean squared error // sip.conf: (MSE), as follows: [general] 2552 port=5060 PSNR = 10 log ; (1) tcpenable=yes 10 P (u(k;m;n)−u^(k;m;n))2 k;m;n KMN tcpbindaddr=0.0.0.0 SIP Server Video Call Video Call SIP client SIP client (Transmitter) (Receive) Network Emulator Iperf client Iperf server Fig. 3: Network testbed for measurement. bindaddr=0.0.0.0 allowoverlap=no videosupport=yes allow=h263 Fig. 4: Example image from the implemented video telephony allow=h264 system. allow=VP8 allow=MPEG4 context=defualt TABLE II: Test condition [2000] Video resolution CIF (386x288) type=friend Frame rate per second 5 - 30 fps secret=1234 Video bits rat 100kbps - 1000kbps host=dynamic Codec H263, H264, MPEG4, VP8 [2001] Video sequences Foreman, Aykiyo type=friend Package loss rate 0 - 0.05 secret=1234 host=dynamic The given configuration shows that each audio and video measure factor for this test, the smart phone was replaced encoding needs to enable before using.