<<

Video-conference Communication Platform Based on WebRTC Online meetings

Jelena Caiko Antons Patlins Institute of Electrical Engineering and Electronics, Institute of Electrical Engineering and Electronics, FEEE Riga Technical University FEEE Riga Technical University Riga, Latvia Riga, Latvia [email protected] [email protected]

Arapov Nurlan Vladimir Protsenko LLP Granit LLP Granit Almaty, Kazakhstan Almaty, Kazakhstan [email protected] [email protected]

Abstract— nowadays, due to the pandemic, many During the coronavirus (COVID-19) pandemic, many organizations have switched to remote work via video people work and hold meetings online. Many users have opted conferencing, but the complexity of organizing video for free video conferencing solutions like . conferencing arises due to the limited number of participants at the same time, as well as the need for large and fully equipped Experts criticize Zoom for weak privacy and continue to conference rooms. This may not be a problem for a large find vulnerabilities: connecting to someone else's enterprise, but for a small and medium-sized business, accessing conversations, transferring data from an iOS application to video conferencing facilities either requires extremely high rents Facebook, lack of end-to-end encryption, the ability to steal or is completely impossible. The goal of this article is to design Windows account password and gain full access to macOS. and implement an open source video conferencing prototype that solves the meeting challenges. However, such services turned out to be susceptible to various vulnerabilities, for example, "zumba bombing". This is Keywords— video conferencing, VoIP, Jitsi, WebRTC, video, when an outsider connects to a web conference and interrupts audio communication. the meeting. Thus, it is still better to use a more secure and private video conferencing solution. The work of video communication in education, with distance learning via the I. INTRODUCTION Internet, has its own characteristics [17], [18]. The Video-based teleconferencing is a form of teleconferencing impossibility of the presenters to directly control the quality of done through mediums that support video and audio the channel's work among the listeners requires constant communication. It is a live video connection between people in monitoring and feedback on the quality of the video, problems separate locations for the purpose of communication or with [19], [20]. The created platform allows training in interaction [1]. videoconference mode [21], [2], [3]. Thus, it became necessary Videoconferencing - communication through electronic to find solutions to create a program that allows video communication, allowing you to hold business meetings, conferencing with data protection. seminars, conferences. This kind of communication saves time In our project we use Jitsi, which provides the necessary and money, which is relevant. software to create your own video conferencing service that Video conferencing has latency requirements of less than a can be deployed using a virtual server. second. Therefore, protocols must be able to support Jitsi is an open source program that allows you to secure negotiation of bit rates, codecs, broadcast methods. audio / video calls and conferences, streaming and desktop In browsers, the minimum latency is provided only by sharing, , file transfer and many others. To Flash-based protocols, RTMP / RTSP, WebRTC and SDLP, use it, you do not need to register an account, and the program and only the latter two can work without plugins. In addition, itself works in a browser. WebRTC supports the P2P (peer-to peer) data transfer mode The need to use Jitsi is the ability to mix audio and make through intermediate TURN servers or directly within the same conference calls. That is how our conference functions were network, which theoretically makes it possible to dispense with first implemented. transcoding servers, save bandwidth of communication channels and unload central servers. II. AUDIO CONFERENCING Unfortunately, things are not so simple on the server side. Jitsi Meet, a full video conferencing application that The reason is that mixing video content requires huge includes a web, Android, and iOS clients. We operate Jitsi processing resources. Meet for communications of small groups 2-10 people. When mixing video content, you need to decode all We started with conferences, where small teams could get incoming frames (one for each participant), zoom out each of together and have discussions. We used and tested different them, create composite images, and then re-encode them. A SIP [5] and XMPP [6] servers. For the first implementation we typical non-HD image stream takes 25 frames per second and used Jitsi to be capable of mixing audio and hosting conference 25 downscaling, 25 compositions and 25 encodings. calls. Based on the above data, there is a problem with the audio Any user can "register" with one of these gateways, quality, which manifests itself in the fact that an echo will providing the information necessary to log on to the network, occur during audio broadcasting, and all users will see exactly and can communicate with network users as if they were users the same video stream with their own reflection. This situation of the network. This means that any that fully supports is resolved by using separate mixes for each participant, XMPP can be used to access any network for which gateways although this exponentially increases the load in conferences exist, without any additional code in the client and without with mixed content. requiring the client to have direct Internet access (Fig.1). Today, video conferencing clients also send the same streams: video from a desktop or a , but compared to the situation described above, instead of receiving one stream in return, they directly receive everyone else's packages the way they were sent. One of the technologies that allows the use of a simplified server is layer-by-layer video encoding on the terminal (Fig.2). Each layer increases the image resolution, so when transmitting video to another terminal, the server does not need to recode anything - it is enough to select so many layers so that the resolution matches the characteristics of this terminal and the communication channel with it.

Fig. 2. Layer-by-layer video encoding on the terminal

The terminal can also prepare full-fledged video streams Fig. 1. Ad hoc audio conferencing with Jitsi with different resolutions (for example, CIF, SD, HD ) (Fig.3). In this case, the server's task also boils down to selecting the The gateway implementation is specific to the XMPP required stream to be sent to another terminal. With these server and is subject to instability due to the closed nature of arrangements, the MCU server becomes, in effect, a video commercial IM services. stream router. When developing video conferencing, we created a convenient user interface that allows you to display the full list of video conference participants, and the ability for all participants to access audio activity. Extended RTP and XMPP were used to implement such options [7], [8], [9].

III. VIDEO CONFERENCING After implementing audio in Jitsi, we used video streams to implement video, allowing participants to see each other during calls and exchange slides or screens using Jitsi desktop sharing features. Using Jitsi makes it easy to mix audio and video on the client side. Fig. 3. Video Streaming Router The number of potential video conferencing terminals has gateway to broadcast video streams to video conference increased thanks to the development of WebRTC technology, participants) (Fig.5). which allows you to participate in video conferencing sessions from a browser without the need to install any additional We did the tests on Jitsi Meet on a dedicated server. The software or download a plug-in. server had all the services for Jitsi Meet: web server (nginx), XMPP server (prosody) and Jitsi Videobridge. It is especially important that the video conferencing server maintains effective quality of service (QoS) mechanisms, During testing, we found out the problem with the Jitsi including traffic prioritization, and the ability to operate on Videobridge, which is overloaded to send a large enough packet loss channels. amount of traffic. Using browsers to fill enough conferences will likely require hundreds or even thousands of computers. It is desirable that the server supports technology that allows to compensate for packet loss: up to 1% of packets To solve this problem, a full star topology was used to without noticeable degradation and up to 5% of packets create a conference. With this topology, inbound traffic from without critical degradation of image quality. With a further any endpoint is propagated to all other endpoints connected to deterioration in quality and / or a decrease in the bandwidth of Jitsi Videobridge. In this configuration, we have streams A * the communication channel, the most critical streams should be (A-1) leaving the video bridge (and arriving A-1). supported, primarily those that carry audio information. Even if the video disappears, voice can always convey the necessary information, so support for voice communication is the most critical. We are taking a more efficient approach. Instead of receiving packets from a single stream, they were sent directly as they were sent. Having received individual streams, user agents can display them in any way they want or their select users. The quality is better because video streams are encoded only once. There is no additional encoding, scaling, or decoding, resulting in stable latency. Mostly, video relaying requires hundreds of times less Fig.5. Conference call with Jitsi resources than mixing (Fig.4). If done correctly, the operation Jitsi Videobridge - is a component of the XMPP server, it can even be implemented in routers and more importantly. allows you to organize multi-user video communication. Its difference from expensive dedicated hardware video bridges is that it does not mix video channels into a composite video stream, but only relays the received video channels to all participants in the call. The power of the processor does not matter for its work. Jitsi Meet supports features such as desktop and window sharing, automatic video switching to the active speaker's video, collaborative document editing in Etherpad, presentation sharing, streaming a conference on YouTube, audio conferencing, participant connectivity through the Jigasi phone gateway, password protection, "talk at the touch of a button" mode, URL invitation sending, and text chat. All data flows between the client and the server are encrypted (assuming that the server is running on its own capacity). Jitsi Meet is available as a separate application (including Android and iOS) and as a library for integration into websites. In general, Jitsi is an excellent video conference, conveniently written and easily embedded on family systems. We created Jitsi meet application window and conference with interconnection between users, as well as full web Fig.4. Relayed video conferences functionality available to participants from the official Jitsi repository [16]. Jitsi Meet is a JavaScript application that uses WebRTC and can work with servers based on Jitsi Videobridge (a We used the Makefile to create an application and put everything together, libraries, files (Fig.6). Gnugk videobridge is written in C++ programming language, server part is based on H323Plus [4]. There are small errors in the writing of this conference, written carelessly and hard to navigate the code, but I assume that the potential is quite large. We created a WebRTC-friendly video bridge via XMPP, which is a server component and has a number of advantages: - ease of authentication, i.e. when a trusted connection is made to the appropriate server, it receives the correct "from" addresses. All that remains is to make sure these addresses match the domain for which it is configured. - ease of video bridge integration into various deployments, i.e. no need to enforce an authentication policy or maintain a user base. XMPP has protocol discovery capability. The XMPP client can simply send a request to its server and get a list of all available components with their supported features. With all of these benefits, Jitsi will only enable its video bridge features in environments where they can actually be used.

V. VIDEO BRIDGE CONNECTION VoIP trapezoid is used to connect users to the signaling server authorized for their domain (Fig 7).

Fig. 6. Code: Makefile SIP/XMPP SIP/XMPP Server Server In other words, by using separate streams, user agents can signalling display them in any way that they or users choose. A number of advantages when encoding video streams once: - the quality is improved; the delay does not increase. By replacing mashing with relaying, much less resources are spent.

IV. VIDEOCONFERENCING WITH JITSI ..audio, video, data.. Client Jitsi Videobridge is not a SIP or an XMPP server that Client agents call and join conferences. It is a remotely controlled Fig.7. SIP trapezoid server component [10]. Jitsi Videobridge is a server-side media component that mixes audio and relays video. Performance and scalability are among its main goals. These servers are used to initiate, modify or terminate a call. During the experiment, it turned out that due to the increase in the number of endpoints, each of which is a destination in Using the SIP and XMPP, you want software or hardware which traffic from all the others is encrypted and delivered, the user agents to include their IP addresses and port numbers number of actual video streams grows in a quadratic when initiating, modifying, or ending a call, and then use them relationship and generates a load. In our test, A = 9, 14, 19, 27, to exchange media. Using protocols like [12], [13] and [14] in 34. the model, the transmission of multimedia and signals occurs in different ways. For the conference, we used the Last N mode on Jitsi Videobridge to distribute and show only the most recent As explained earlier [15], the Colibri protocol allows video speakers. conference organizers to communicate with the video bridge as if they were interacting with it locally. But instead of locally It is important to create a load so that the number of streams allocating ports, clients distribute them on a remote component. and bit rate are important. The procedure where the conference organizer assigns port Our goal is to achieve that with a large number of video numbers on the video bridge before establishing a call (Fig. streams, the same amount of CPU resources and bandwidth are 8.a). required, regardless of whether a thousand endpoints receive one stream or a hundred endpoints receive ten each. Using the address and port that it received from the bridge, it establishes regular sessions only with other participants (Fig. 8.b). The next step, the conference participants transfer the -SSRC preallocation; Implementation of DTLS / SRTP key media to the video bridge and receive it from the video bridge agreement support; (Fig. 8.c). After that, the conference organizer has the option to choose to send additional information to the list of participants. -Trickle ICE. This allowed Jitsi Videobridge to be used in clean deployments or WebRTC environments where SIP and / Videobridge or XMPP clients coexist with browsers. Videobridge

VII. CONCLUSION The most advanced and reliable video conferencing servers XMPP server today are software solutions based on standard architecture XMPP server servers. They give the optimum ratio of manufacturability- quality-reliability-cost. Such platforms are as flexible as possible - they can be easily customized and adapted to the needs of any organization. When choosing among software platforms, we pay attention to the technical characteristics of the hardware, the compatibility of the platform with the solutions that are already installed in you or in the organizations with which you will communicate via video. At the same time, WebRTC support will not be superfluous. a) Channel allocation b) Session Initiation The creation of Jitsi Videobridge for an open source video conferencing project was paramount, with the condition of further acceptance. Videobridge In the future, developing this project, it is necessary to achieve the ability to support mobile clients, use (SVC) and simulcast or simultaneous broadcasting, relay and conduct large-scale conferences. Moreover, Jitsi Videobridge implements switchable or selective video relaying so that only some streams are displayed on mobile devices, thus making it mobile friendly. By optimizing bandwidth consumption and improving error Stream B Stream C Stream

Stream A Stream resilience as well as usability, SVC and simulcast are important. Retransmission strategies are needed to support online classes and large-scale conferences.

c) Media relaying There is a certain weakness in the privacy process: it is the decryption and re-encryption of data on the Jitsi Meet server. Fig. 8. Jitsi Videobridge: Channel allocation, session Initiation and media The way to get around this is to install the Jitsi Meet software relaying on a private server that you manage, which means that all data remains safe. The conference organizer is special, it can interact with the video bridge directly by signaling. And it is allowed for any of REFERENCES the other participants to support this interaction or use a specific signaling protocol. [1] K.V. Rop, Video conferencing and its application in distance learning, Conference: Annual Interdisciplinary Conference, The Catholic When using Jitsi, the video bridge is currently used in the University of Eastern Africa, Nairobi Kenya, June 2012, Volume 1. context of XMPP calls. [2] Kunicina N., Ziravecka A., Patlins A., Caiko J., Ribickis L. „Towards of e-learning quality standards for Electrical Engineers” // Information Systems Technology and Management. 6th International Conference VI. WEB RTC ICISTM-2012, Grenoble, France, Proceedings. 2012. pages - 292-303. Due to the similar separation of signaling and media, the ISBN: 978-3-642-29165-4. [3] Patlins A., Kuņicina N., Čaiko J. “Information Tools for Education of WebRTC architecture is similar to the SIP and XMPP Electrical Engineers”. Proccedings of the 6th IEEE International protocols. WebRTC also uses the same protocols for media Conference on Intelligent Data Acquisition and Advanced Computing transfer (SRTP) and NAT traversal (ICE). The result of the Systems: Technology and Applications. 2011, Prague, Czech Republic. work described in this article is the creation by us of Jitsi [4] ITU-T Recommendation H.323, Packet-based multimedia Videobridge compatible with browsers. To create such a communications systems. solution, the following tasks were done: [5] Jonathan Rosenberg, Henning Schulzrinne, Gonzalo Camarillo, Alan Johnston, Jon Peterson, Robert Sparks, Mark Handley, Eve Schooler, RFC 3261: Session Initiation Protocol. Ineternet Engineering Task [16] https://github.com/jitsi/jitsi-meet/blob/master/app.js Force, June 2002. [17] Zabašta, A., Žiravecka, A., Kuņicina, N., Čaiko, J., Ribickis, L. [6] Peter Saint-Andre, RFC 6120: Extensible Messaging and Presence Collaborative Learning Outcomes for Creation of Industry-oriented Protocol (XMPP): Core. Ineternet Engineering Task Force, June 2015, Curricular: a Case Study of ERASMUS+ Project Physics. In: 2019 IEEE ISSN: 2070-1721. Global Engineering Education Conference (EDUCON 2019), United [7] J. Lennox, E. Ivov, and E. Marocco, (May 19, 2012.) "A Real-time Arab Emirates, Dubai, 9-11 April, 2019. Piscataway: IEEE, 2019, Transport Protocol (RTP) Header Extension for Client-to-Mixer Audio pp.685-692. Level Indication." Internet Engineering Task Force RFC 6464 (Status: [18] Bula, I., Hajrizi, E., Kuņicina, N. Demonstration of the Use of Robotics Standards Track). Copyright (c) 2011 IETF Trust and the persons in the Development of a Scrap Processing Model for Mechatronic identified as the document authors. All rights reserved. Education. In: 2019 IEEE 60th International Scientific Conference on [8] E. Ivov, E. Marocco, and J. Lennox (December 2011) "A Real-time Power and Electrical Engineering of Riga Technical University Transport Protocol (RTP) Header Extension for Mixer-to-Client Audio (RTUCON2019), Latvia, Riga, 7-9 October, 2019. Piscataway: IEEE, Level Indication." Internet Engineering Task Force RFC 6465 (Status: 2019, pp.1-6. Standards Track). ISSN: 2070-1721 [19] Kuņicina, N., Žiravecka, A., Ribickis, L., Zabasta, A., Bilic, I. [9] E. Ivov, and E. Marocco "XEP-0298: Delivering Conference Development of Entrepreneurship Skills for Innovation Driven Career in Information to Jingle Participants (Coin)." XMPP Standards Foundation, Adaptronics. In: 2019 IEEE 60th International Scientific Conference on XEP-0298, 2015-07-02 Power and Electrical Engineering of Riga Technical University (RTUCON2019), Latvia, Riga, 7-9 October, 2019. Piscataway: IEEE, [10] LibJitsi: An advanced Java media library for secure real-time 2019, pp.1-6. audio/video communication – https://jitsi.org/libjitsi [20] Bērziņa, K., Žiravecka, A., Kuņicina, N., Caiko, J. Promoting of [11] A. Johnston. SIP: Understanding the Session Initiation Protocol, Second Lifelong Learning in Engineering. In: 2019 IEEE 60th International Edition. Artech House, Inc., Norwood, MA, USA, 2003. Scientific Conference on Power and Electrical Engineering of Riga [12] J. Rosenberg, R. Mahy, P. Matthews, D. Wing, RFC 5389: Session Technical University (RTUCON2019), Latvia, Riga, 7-9 October, 2019. Traversal Utilities for NAT (STUN). Internet Engineering Task Force, Piscataway: IEEE, 2019, pp.1-6. Octobre 2008. [21] Kuņicina, N., Zabašta, A., Bruzgiene, R., Dubauskiene, N., Patļins, A., [13] Rohan Mahy, Philippe Matthews, et Jonathan Rosenberg, RFC 5766: Ribickis, L. Student Engagement in Cross-Domain Innovation Traversal Using Relays around NAT(TURN): Relay Extensions to Development and the Impact of IT on Learning Outcomes and Career Session Traversal Utilities for NAT (STUN). Internet Engineering Task Development in Electrical Engineering. In: 2019 IEEE Global Force, Avril 2010. Engineering Education Conference (EDUCON 2019), United Arab [14] Jonathan Rosenberg, Interactive Connectivity Establishment (ICE): A Emirates, Dubai, 9-11 April, 2019. Piscataway: IEEE, 2019, pp.693-700. Protocol for Network Address Translator (NAT) Traversal for ISBN 978-1-5386-9507-4. e-ISBN 978-1-5386-9506-7. ISSN 2165- Offer/Answer Protocols. Internet Engineering Task Force, Avril 2010. 9559. e-ISSN 2165-9567. Available from: [15] Jelena Caiko, Vladimir Kirpun, Vladimir Protsenko, Problems using doi:10.1109/EDUCON.2019.8725269 video conferencing via internet, Инфокоммуникационные Jelena Caiko, Dr.sc.ing. Leading Researcher at RTU. She is the author of 45 технологии: Современное состояние и пути развития Сборник scientific publications. The fields of scientific interests are electrical трудов Международной научно – практической конференции engineering, telecommunication, machine learning, wireless networks. посвященная 100- летию войск связи 10 декабря 2019 г., стр. 82-85. [email protected]