Sonja Gievska • Gjorgji Madjarov (Eds.)
ICT Innovations 2019 Big Data Processing and Mining 11th International Conference, ICT Innovations 2019 Ohrid, North Macedonia, October 17–19, 2019 Proceedings
123 Preface
The ICT Innovations conference series, organized by the Macedonian Society of Information and Communication Technologies (ICT-ACT) is an international forum for presenting scientific results related to innovative fundamental and applied research in ICT. The 11th ICT Innovations 2019 conference that brought together academics, students, and industrial practitioners, was held in Ohrid, Republic of North Macedonia, during October 17–19, 2019. The focal point for this year’s conference was “Big Data Processing and Mining,” with topics extending across several fields including social network analysis, natural language processing, deep learning, sensor network analysis, bioinformatics, FinTech, privacy, and security. Big data is heralded as one of the most exciting challenges in data science, as well as the next frontier of innovations. The spread of smart, ubiquitous computing and social networking have brought to light more information to consider. Storage, integration, processing, and analysis of massive quantities of data pose significant challenges that have yet to be fully addressed. Extracting patterns from big data provides exciting new fronts for behavioral analytics, predictive and prescriptive modeling, and knowledge discovery. By leveraging the advances in deep learning, stream analytics, large-scale graph analysis and distributed data mining, a number of tasks in fields like, biology, games, robotics, commerce, transportation, and health care have been brought within reach. Some of these topics were brought to the forefront of the ICT Innovations 2019 conference. This book presents a selection of papers presented at the conference which contributed to the discussions on various aspects of big data mining (including algo- rithms, models, systems, and applications). The conference gathered 184 authors from 24 countries reporting their scientific work and solutions in ICT. Only 18 papers were selected for this edition by the international Program Committee, consisting of 176 members from 43 countries, chosen for their scientific excellence in their specific fields. We would like to express our sincere gratitude to the authors for sharing their most recent research, practical solutions, and experiences, allowing us to contribute to the discussion on the trends, opportunities, and challenges in the field of big data. We are grateful to the reviewers for the dedicated support they provided to our thorough reviewing process. Our work was made easier by following the procedures developed and passed along by Prof. Slobodan Kaljadziski, the co-chair of the ICT Innovations 2018 conference. Special thanks to Ilinka Ivanoska, Bojana Koteska, and Monika Simjanoska for their support in organizing the conference and for the technical preparation of the conference proceedings.
October 2019 Sonja Gievska Gjorgji Madjarov Organization
Conference and Program Chairs
Sonja Gievska University Ss.Cyril and Methodius, North Macedonia Gjorgji Madjarov University Ss.Cyril and Methodius, North Macedonia
Program Committee
Jugoslav Achkoski General Mihailo Apostolski Military Academy, North Macedonia Nevena Ackovska University Ss.Cyril and Methodius, North Macedonia Syed Ahsan Technische Universität Graz, Austria Marco Aiello University of Groningen, The Netherlands Azir Aliu Southeastern European University of North Macedonia, North Macedonia Luis Alvarez Sabucedo Universidade de Vigo, Spain Ljupcho Antovski University Ss.Cyril and Methodius, North Macedonia Stulman Ariel The Jerusalem College of Technology, Israel Goce Armenski University Ss.Cyril and Methodius, North Macedonia Hrachya Astsatryan National Academy of Sciences of Armenia, Armenia Tsonka Baicheva Bulgarian Academy of Science, Bulgaria Verica Bakeva University Ss.Cyril and Methodius, North Macedonia Antun Balaz Institute of Physics Belgrade, Serbia Lasko Basnarkov University Ss.Cyril and Methodius, North Macedonia Slobodan Bojanic Universidad Politécnica de Madrid, Spain Erik Bongcam-Rudloff SLU-Global Bioinformatics Centre, Sweden Singh Brajesh Kumar RBS College, India Torsten Braun University of Berne, Switzerland Andrej Brodnik University of Ljubljana, Slovenia Francesc Burrull Universidad Politécnica de Cartagena, Spain Neville Calleja University of Malta, Malta Valderrama Carlos UMons University of Mons, Belgium Ivan Chorbev University Ss.Cyril and Methodius, North Macedonia Ioanna Chouvarda Aristotle University of Thessaloniki, Greece Trefois Christophe University of Luxembourg, Luxembourg Betim Cico Epoka University, Albania Emmanuel Conchon Institut de Recherche en Informatique de Toulouse, France Robertas Damasevicius Kaunas University of Technology, Lithuania Pasqua D’Ambra IAC, CNR, Italy Danco Davcev University Ss.Cyril and Methodius, North Macedonia Antonio De Nicola ENEA, Italy viii Organization
Domenica D’Elia Institute for Biomedical Technologies, Italy Vesna Dimitrievska University Ss.Cyril and Methodius, North Macedonia Ristovska Vesna Dimitrova University Ss.Cyril and Methodius, North Macedonia Ivica Dimitrovski University Ss.Cyril and Methodius, North Macedonia Salvatore Distefano University of Messina, Italy Milena Djukanovic University of Montenegro, Montenegro Ciprian Dobre University Politehnica of Bucharest, Romania Martin Drlik Constantine the Philosopher University in Nitra, Slovakia Sissy Efthimiadou ELGO Dimitra Agricultural Research Institute, Greece Tome Eftimov Stanford University, USA, and Jožef Stefan Institute, Slovenia Stoimenova Eugenia Bulgarian Academy of Sciences, Bulgaria Majlinda Fetaji Southeastern European University of North Macedonia, North Macedonia Sonja Filiposka University Ss.Cyril and Methodius, North Macedonia Predrag Filipovikj Mälardalen University, Sweden Ivan Ganchev University of Limerick, Ireland Todor Ganchev Technical University Varna, Bulgaria Nuno Garcia Universidade da Beira Interior, Portugal Andrey Gavrilov Novosibirsk State Technical University, Russia Ilche Georgievski University of Groningen, The Netherlands John Gialelis University of Patras, Greece Sonja Gievska University Ss.Cyril and Methodius, North Macedonia Hristijan Gjoreski University Ss.Cyril and Methodius, North Macedonia Dejan Gjorgjevikj University Ss.Cyril and Methodius, North Macedonia Danilo Gligoroski Norwegian University of Science and Technology, Norway Rossitza Goleva Technical University of Sofia, Bulgaria Andrej Grgurić Ericsson Nikola Tesla d.d., Croatia David Guralnick International E-Learning Association (IELA), France Marjan Gushev University Ss.Cyril and Methodius, North Macedonia Elena Hadzieva University St. Paul the Apostle, North Macedonia Violeta Holmes University of Hudersfield, UK Ladislav Huraj University of SS. Cyril and Methodius, Slovakia Sergio Ilarri University of Zaragoza, Spain Natasha Ilievska University Ss.Cyril and Methodius, North Macedonia Ilija Ilievski Graduate School for Integrative Sciences and Engineering, Singapore Mirjana Ivanovic University of Novi Sad, Serbia Boro Jakimovski University Ss.Cyril and Methodius, North Macedonia Smilka Janeska-Sarkanjac University Ss.Cyril and Methodius, North Macedonia Mile Jovanov University Ss.Cyril and Methodius, North Macedonia Milos Jovanovik University Ss.Cyril and Methodius, North Macedonia Organization ix
Zucko Jurica Faculty of Food Technology and Biotechnology, Croatia Slobodan Kalajdziski University Ss.Cyril and Methodius, North Macedonia Kalinka Kaloyanova FMI-University of Sofia, Bulgaria Aneta Karaivanova Bulgarian Academy of Sciences, Bulgaria Mirjana Kljajic Borstnar University of Maribor, Slovenia Ljupcho Kocarev University Ss.Cyril and Methodius, North Macedonia Dragi Kocev Jožef Stefan Institute, Slovenia Margita Kon-Popovska University Ss.Cyril and Methodius, North Macedonia Magdalena Kostoska University Ss.Cyril and Methodius, North Macedonia Bojana Koteska University Ss.Cyril and Methodius, North Macedonia Ivan Kraljevski VoiceINTERconnect GmbH, Germany Andrea Kulakov University Ss.Cyril and Methodius, North Macedonia Arianit Kurti Linnaeus University, Sweden Xu Lai Bournemouth University, UK Petre Lameski University of Ss. Cyril and Methodius, North Macedonia Suzana Loshkovska University Ss.Cyril and Methodius, North Macedonia José Machado Da Silva University of Porto, Portugal Ana Madevska Bogdanova University Ss.Cyril and Methodius, North Macedonia Gjorgji Madjarov University Ss.Cyril and Methodius, North Macedonia Yousef Malik Zefat Academic College, Israel Tudruj Marek Polish Academy of Sciences, Poland Ninoslav Marina University St. Paul the Apostole, North Macedonia Smile Markovski University Ss.Cyril and Methodius, North Macedonia Marcin Michalak Silesian University of Technology, Poland Hristina Mihajlovska University Ss.Cyril and Methodius, North Macedonia Marija Mihova University Ss.Cyril and Methodius, North Macedonia Aleksandra Mileva University Goce Delcev, North Macedonia Biljana Mileva Boshkoska Faculty of Information Studies in Novo Mesto, Slovenia Georgina Mirceva University Ss.Cyril and Methodius, North Macedonia Miroslav Mirchev University Ss.Cyril and Methodius, North Macedonia Igor Mishkovski University Ss.Cyril and Methodius, North Macedonia Kosta Mitreski University Ss.Cyril and Methodius, North Macedonia Pece Mitrevski University St. Kliment Ohridski, North Macedonia Irina Mocanu University Politehnica of Bucharest, Romania Ammar Mohammed Cairo University, Egypt Andreja Naumoski University Ss.Cyril and Methodius, North Macedonia Manuel Noguera University of Granada, Spain Thiare Ousmane Gaston Berger University, Senegal Eleni Papakonstantinou Genetics Lab, Greece Marcin Paprzycki Polish Academy of Sciences, Poland Dana Petcu West University of Timisoara, Romania Antonio Pinheiro Universidade da Beira Interior, Portugal Matus Pleva Technical University of Košice, Slovakia Florin Pop University Politehnica of Bucharest, Romania x Organization
Zaneta Popeska University Ss.Cyril and Methodius, North Macedonia Aleksandra University Ss.Cyril and Methodius, North Macedonia Popovska-Mitrovikj Marco Porta University of Pavia, Italy Ustijana Rechkoska University of Information Science and Technology Shikoska St. Paul The Apostle, North Macedonia Manjeet Rege University of St. Thomas, USA Bernd Rinn ETH Zurich, Switzerland Blagoj Ristevski University St. Kliment Ohridski, North Macedonia Sasko Ristov University Ss.Cyril and Methodius, North Macedonia Witold Rudnicki University of Białystok, Poland Jelena Ruzic Mediteranean Institute for Life Sciences, Croatia David Šafránek Masaryk University, Czech Republic Simona Samardjiska University Ss.Cyril and Methodius, North Macedonia Wibowo Santoso Central Queensland University, Australia Snezana Savovska University St. Kliment Ohridski, North Macedonia Loren Schwiebert Wayne State University, USA Xu Shuxiang University of Tasmania, Australia Vladimir Siládi Matej Bel University, Slovakia Josep Silva Universitat Politècnica de València, Spain Ana Sokolova University of Salzburg, Austria Michael Sonntag Johannes Kepler University Linz, Austria Dejan Spasov University Ss.Cyril and Methodius, North Macedonia Todorova Stela University of Agriculture, Bulgaria Goran Stojanovski Elevate Global, North Macedonia Biljana Stojkoska University Ss.Cyril and Methodius, North Macedonia Ariel Stulman The Jerusalem College of Technology, Israel Spinsante Susanna Università Politecnica delle Marche, Italy Ousmane Thiare Gaston Berger University, Senegal Biljana Tojtovska University Ss.Cyril and Methodius, North Macedonia Yalcin Tolga NXP Labs, UK Dimitar Trajanov University Ss.Cyril and Methodius, North Macedonia Ljiljana Trajkovic Simon Fraser University, Canada Vladimir Trajkovik University Ss.Cyril and Methodius, North Macedonia Denis Trcek University of Ljubljana, Slovenia Christophe Trefois University of Luxembourg, Luxembourg Kire Trivodaliev University Ss.Cyril and Methodius, North Macedonia Katarina Trojacanec University Ss.Cyril and Methodius, North Macedonia Hieu Trung Huynh Industrial University of Ho Chi Minh City, Vietnam Zlatko Varbanov Veliko Tarnovo University, Bulgaria Goran Velinov University Ss.Cyril and Methodius, North Macedonia Elena Vlahu-Gjorgievska University of Wollongong, Australia Irena Vodenska Boston University, USA Katarzyna Wac University of Geneva, Switzerland Yue Wuyi Konan University, Japan Zeng Xiangyan Fort Valley State University, USA Organization xi
Shuxiang Xu University of Tasmania, Australia Rita Yi Man Li Hong Kong Shue Yan University, Hong Kong, China Malik Yousef Zefat Academic College, Israel Massimiliano Zannin INNAXIS Foundation Research Institute, Spain Zoran Zdravev University Goce Delcev, North Macedonia Eftim Zdravevski University of SS. Cyril and Methodius, North Macedonia Vladimir Zdravevski University Ss.Cyril and Methodius, North Macedonia Katerina Zdravkova University Ss.Cyril and Methodius, North Macedonia Jurica Zucko Faculty of Food Technology and Biotechnology, Croatia Chang Ai Sun University of Science and Technology Beijing, China Yin Fu Huang University of Science and Technology, Taiwan Suliman Mohamed Fati INTI International University, Malaysia Hwee San Lim University Sains Malaysia, Malaysia Fu Shiung Hsieh University of Technology, Taiwan Zlatko Varbanov Veliko Tarnovo University, Bulgaria Dimitrios Vlachakis Genetics Lab, Greece Boris Vrdoljak University of Zagreb, Croatia Maja Zagorščak National Institute of Biology, Slovenia
Scientific Committee
Danco Davcev University Ss.Cyril and Methodius, North Macedonia Dejan Gjorgjevikj University Ss.Cyril and Methodius, North Macedonia Boro Jakimovski University Ss.Cyril and Methodius, North Macedonia Aleksandra University Ss.Cyril and Methodius, North Macedonia Popovska-Mitrovikj Sonja Gievska University Ss.Cyril and Methodius, North Macedonia Gjorgji Madjarov University Ss.Cyril and Methodius, North Macedonia
Technical Committee
Ilinka Ivanoska University Ss.Cyril and Methodius, North Macedonia Monika Simjanoska University Ss.Cyril and Methodius, North Macedonia Bojana Koteska University Ss.Cyril and Methodius, North Macedonia Martina Toshevska University Ss.Cyril and Methodius, North Macedonia Frosina Stojanovska University Ss.Cyril and Methodius, North Macedonia
Additional Reviewers
Emanouil Atanassov Emanuele Pio Barracchia Table of Contents
ICT Innovations 2019
Performance Analysis of ECG Processing on Mobile Devices ...... 1 Monika Simjanoska, Bojana Koteska, Marjan Gusev, Ana Madevska Bogdanova
Public NTP Stratum 2 Server for Time Synchronization of Devices and Applications in Republic of North Macedonia ...... 16 Bogdan Jeliskoski, Veno Pachovski, Dobre Blazevski, Irena Stojmen- ovska
Risk analysis and risk management in IT industry...... 31 Aleksandar Kovachev, Vesna Dimitrova
A Review-based Survey of Mobile Student Service ...... 42 Sasho Gramatikov, Aleksandar Spasov, Ana Gjorgjevikj, Dimitar Tra- janov
Rating and Geo-Mapping of Travel Agencies that Provide Vacations to Long-Distance Destinations from Skopje, North Macedonia ...... 58 Dina Gitova, Elena M. Jovanovska, Andreja Naumoski
Using continuous integration principles for managing virtual laboratories .. 67 Dejan Stamenov, Dimitar Venov, Goran Petkovski, Boro Jakimovski, Goran Velinov
Visualization of the nets of type of Zaremba-Halton constructed in gener- alized number system? ...... 76 Vesna Dimitrievska Ristovska, Vassil Grozdanov, Tsvetelina Petrova
Minimizing Total Weight of Feedback Arcs in Flow Graphs ...... 87 Marija Mihova, Ilinka Ivanoska, Kristijan Trifunovski, Kire Trivo- daliev
HPM-CFSI: A High-Performance Algorithm for Mining Closed Frequent Significance Itemset ...... 98 Huan Phan, Bac Le
Object Detection Using Synthesized Data ...... 110 Matija Buric, Goran Paulin, Marina Ivasic-Kos
Comparing Databases for Inserting and Querying Jsons for Big Data .....125 Panche Ribarski, Bojan Ilijoski, Biljana Tojtovska SHELDON Workshop - Solutions for ageing well at home
Telepresence mobile robot platform for elderly car ...... 137 Natasa Koceska, Saso Koceski
BIM and AAL ...... 140 Volker Koch, Maria Victoria Gmez Gmez, Petra von Both
Interactive textiles for designing adaptive space for elderly ...... 143 Amine Haj Taieb, Amal Lajmi
Solutions in built environment to prevent social isolation ...... 145 Veronika Kotradyova
A short history of IoT architectures through real AAL Cases ...... 153 Pedro C. Lpez-Garca, Andrs L. Bleda-Toms, Miguel ngel Beteta-Medina, Jose Arturo Vidal-Poveda, Francisco J. Melero-Muoz, Rafael Maestre- Ferriz
Resource Allocation using network Centrality in the AAL environment ...157 Katerina Papanikolaou, Constandinos Mavromoustakis, Ciprian Dobre
Framework for the development of mobile AAL applications ...... 161 Katerina Papanikolaou, Constandinos Mavromoustakis, Ciprian Dobre SHELDON Workshop - Solutions for ageing well in the community
The use of Technology to promote activity and communication by the municipality services, examples from an urban area ...... 164 Birgitta Langhammer, Ainta Woll, Jeppe Skjnberg
Thermal environment assessment and adaptative model analysis in elderly centers in the Atlantic climate ...... 169 Pedro Torres, Joana Madureira, Lvia Aguiar, Cristiana Pereira, Ar- manda Teixeira-Gomes, Maria Paula Neves, Joo Paulo Teixeira, Ana Mendes
Climate change in the city and the elderlies ...... 172 Emanuele Naboni
Loneliness is not simply being alone. How understand the loneliness in ageing to create smart environments ...... 175 Lucia Gonzlez Lpez, Ana Perandrs Gmez, Alfonso Cruz Lendnez
From Co-design to Industry: investigating both ends of the scale of devel- oping Social Robots ...... 178 Kasper Rodil, Michal Isaacson Thermal comfort perception in elderly care centers. Case study of Lithuania181 Lina Seduikyte, Mantas Dobravalskis, Ugn Didiariekyt
Human machine interaction - Assessment of seniors characteristics and its importance for predicting performance patterns ...... 184 Karel Schmeidler, Katerina Marsalkova
Potential biomarkers and risk factors associated with frailty syndrome in older adults preliminary results from the BioFrail study ...... 187 Armanda Teixeira-Gomes, Solange Costa, Bruna Lage, Susana Silva, Vanessa Valdiglesias, Blanca La↵on, Joo Paulo Teixeira
Consciousness Society creation. Stage development ...... 189 Dumitru Todoroi
Attempts to improve the situation of elderly people in the Republic of Moldova ...... 192 Dumitru Todoroi SHELDON Workshop - Solutions for ageing well at work
Quantitative analysis of publications on policies for ageing at work ...... 195 Jasmina Barakovi Husi, Sabina Barakovi, Petre Lameski, Eftim Zdravevski, Vladimir Trajkovik, Ivan Chorbev, Igor Ljubi, Petra Mareova, Ondrej Krejcar, Signe Tomsone
Can deep learning contribute to healthy aging at work? ...... 198 Petre Lameski, Eftim Zdravevski, Ivan Chorbev, Rossitza Goleva, Nuno Pombo, Nuno M. Garcia, Ivan Pires, Vladimir Trajkovik
Work and social activity for longer healthy life ...... 201 Lilia Raycheva
Reverse mentoring as an alternative strategy for lifelong learning in the workplace ...... 204 Neka Sajini, Andreja Isteni Stari, Anna Sandak
What users want research on workers, employers and caregivers demands on SmartWork AI system ...... 207 Willeke van Staalduinen, Carina Dantas, Sofia Ortet
Healthy aging at work: Trends in scientific publications ...... 210 Eftim Zdravevski, Petre Lameski, Ivan Chorbev, Rossitza Goleva, Nuno Pombo, Nuno M. Garcia, Ivan Pires, Vladimir Trajkovik SHELDON Workshop - Technology Adoption
Challenges in building AI-based connected health services for older adults . 213 Hrvoje Belani, Igor Ljubi, Kristina Drusany Stari Connected health and wellbeing for people with complex communication needs ...... 216 Hrvoje Belani, Kristina Drusany Stari, Igor Ljubi Analysis on competences and needs of senior citizens related to domotics ..219 Tamara Suttner, Paul Held
Comparison of technology adoption models ...... 222 Agnieszka Landowska Choice architecture to promote the adoption of new technologies ...... 225 Dean Lipovac, Michael D. Burnard Swallowing the bitter pill: Emotional barriers to the adoption of medicine compliance systems among older adults ...... 228 Vered Shay, Kuldar Taveter, Michal Isaacson An edge computing robot experience for automatic elderly mental health care based on voice ...... 231 C. Yvano↵-Frenchin, V. Ramos, T. Belabed, C. Valderrama EMBNet Workshop Towards Integration Exposome Data and Personal Health Records in the Age of IoT ...... 237 Savoska Snezana, Blagoj Ristevski, Natasha Blazheska Tabakovska, Ilija Jolevski Complex Network Analysis and Big Omics Data ...... 247 Blagoj Ristevski, Snezana Savoska, Pece Mitrevski Big Data analysis and reproducibility: new challenges and needs of the post-genomics era ...... 253 Aspasia Efthimiadou, Eleni Papakonstantinou, Dimitrios Vlachakis, Domenica DElia, Erik Bongcam-Rudlo↵
Author Index ...... 258 ICT Innovations 2019 Web Proceedings ISSN 1857-7288
Performance Analysis of ECG Processing on Mobile Devices
Monika Simjanoska1[0000−0002−5028−3841], Bojana Koteska1[0000−0001−6118−9044], Marjan Gusev1[0000−0003−0351−9783], and Ana Madevska Bogdanova1[0000−0002−0906−3548]
Faculty of Computer Science and Engineering, Ss. Cyril and Methodius University Rugjer Boshkovik 16, PO Box 393, 1000 Skopje, Macedonia {monika.simjanoska,bojana.koteska, marjan.gushev,ana.madevska.bogdanova}@finki.ukim.mk
Abstract. Recently, there is a great focus on preventive medicine and consequently, lots of telemedicine applications have been developed for real-time health monitoring. The monitoring requires a set of biosensors and tablets, smartphones or other mobile devices. The biosensors contin- uously scan and transmit data and the mobile devices have the capacity to process the data and at the same time to save the battery as much as possible. In this paper, we assess the performance of an algorithm for ECG derived heart rate and respiratory rate. Three different scenar- ios are inspected encompassing local, remote Linux server, and remote MatLab⃝c server processing of the locally collected ECG signals by using the biosensors attached to a person. We set a hypothesis that processing locally instead of remotely is worse in terms of time and power con- sumption. Thus, developing a web service for processing will be best in both time and battery consumption. A total of 1297 signals havebeen processed. The results show that the mobile device is appropriate for processing ECG files up to approximately 5MB size. The processingof larger files spends too many resources when compared to the remote pro- cessing solution. Additionally, we have proved that splitting large files in chunks up to 4 MB per chunk resulted in faster processing for the local scenario.
Keywords: ECG · Algorithm · Signal processing · Mobile Platform.
1 Introduction
Preventive rather than curative medicine has brought small laboratoriesinto people’s own devices. Biosensors are no longer high-cost equipment available only to the hospital units or to the great research centers. Instead, there is a trend of developing smaller units to measure the persons’ vital parameters without degrading their comfortability. Those biosensors are able to capture the persons’ signals from various sources such as heart electrical activity (ECG), heart sounds (PCG), oxygen saturation level (PPG), brain electrical activity
1 1 ICT Innovations 2019 Web Proceedings ISSN 1857-7288
(EEG), etc. The signals are continuously transmitted and consequently need to be processed in real-time to extract reliable data of the current health condition. ECG is one of the most popular physiological signals that can accurately provide an insight into the individual’s heart rate, respiratory rate, and even blood pressure (mostly combined with other signals). This ability makes the ECG to be widely used in cases of rapid diagnosis when no other equipment is available, meaning it is suitable to use a mobile device as a processing unit [10]. However, the up-to-date mobile devices still lack the storage and processing power of the ”big” computers and therefore, we must take into account thelimits of the storage, processing, and battery when creating algorithms for real-time ECG signals processing. Usually, the biosensors do not have their own memory and these data are sent to a portable electronic device located close to the sensor (for ex. mobile phone or tablet). A mobile device can receive real-time data from several biosensors attached to a patient and are used to monitor the health condition. This is a typical case of measuring vital parameters when a patient wears a portable biosensor that monitors ECG. The final goal is to monitor the health condition and to save the phone battery as much as possible. In order to provide an accurate and fast algorithm performance preserving a long battery life, the huge dilemma is where to execute the algorithm: locally on the mobile device or on a remote server. For example, if a patient wantsto check the heart rate and respiratory rate after every 30 minutes, the dilemma is: to send the ECG data from the customer’s phone to a remote server and execute the algorithm remotely or to execute the algorithm on the customer’s phone. We set a hypothesis that local processing performs worse than remote Linux server and remote MatLab⃝c processing in terms of storage, computation, and consequently battery consumption. It is expected that developing a web service will provide faster processing time; however, the effect of the server response times on the energy consumption of mobile clients [16] still has to be investigated. In order to confirm the hypothesis, we conducted an experiment to evaluate the algorithm’s performance and battery consumption of the mobile device. Three scenarios are included: local processing on a mobile device, own remote Linux-hosted web service, and a remote MatLab⃝c -hosted service. Further on, we set up a research problem to find out the most time effi- cient and battery efficient way for real-time processing of an ECG signal and calculating ECG derived heart rate and respiratory rate. The experiment includes testing of various ECG signal lengths from three dif- ferent databases relied on different ECG sources in terms of sampling frequency, including Cooking Hacks e-Health Sensor Platform [11], Savvy Sensor Platform [19] and Physionet [23]. The rest of the paper is organized as follows. Related work in the area of ECG processing algorithms designed for portable devices is given in Section 2. Section 3 presents the testing methodology, plan and infrastructure. The results from the experiments are described in Section 4. Section 5 is dedicated to a discussion of the outcome and time processing trade-off, comparison of theresults
2 2 ICT Innovations 2019 Web Proceedings ISSN 1857-7288
and analysis which environment provides the best processing time and longest battery life. The conclusion and future work are specified in Section 6.
2RelatedWork
Parak and Havlik stress the importance of developing an approach that saves computing time, aiming to achieve suitable microprocessor implementation [17]. Elgendi at al. [7] recommend a first derivate of the flittered ECG signal for long-term processing of ECG recordings and ECG large databases since it is highly numerically efficient for QRS detection phase. Elgendi presented a mobile phone application for QRS complex identification out of large recorded ECG signals and achieves a processing time of 2.2 seconds for an ECG signal of 130 minutes [6]. A novel mobile-friendly approach for detection of QRS complexes from a high fidelity ECG real-time monitoring is provided in [2]. The authors used continuous wavelet transform which resulted in lesser computation time compared to Pan- Tompkins algorithm. Li at al. [14] proposed a low power and efficient QRS processor for real-time ECG monitoring based on harr wavelet transform and optimized FIR filter structure. Power consumption is only 410 nW in 1 V voltage supply. Lahiri et al. proposed an algorithm for real-time compressing ECG wave- forms and send them through a mobile phone. They compare their algorithm performance with several existing compression algorithms and concluded that their algorithm can achieve good compression with low distortion on the re- ceiving side [13]. Gia at al. [8] analyzed ECG signals in smart gateways with wavelet transform mechanism. Their experimental results showed that fog com- puting helped to achieve more than 90% bandwidth efficiency and low-latency real-time response at the edge of the network. Mazomenos et al. [15] introduced a low-complexity algorithm for fiducial points from ECG designed for mobile healthcare applications. They have applied discrete wavelet transformation as a principal analysis method. Their algorithm was applied to 477 ECG recordings and achieved an ideal trade-off between computational complexity and performance. Ibaida et al. [12] proposed a compression algorithm to minimize the traffic going to and coming from cloud ECG compression. Their technique achieved a higher compression ratio of 40 with lower Percentage Residual Difference (PRD) Value less than 1%. Wang et al. [22] developed a cloud-based system for per- forming massive ECG data analysis. Their system showed up to 38x performance improvement and 142x energy efficiency improvement compared to commercial systems. A cloud system providing real-time ECG data acquisition, transmis- sion and analyzing over the Internet was presented in [24]. Evaluation of this system showed that it has good performance in terms of real-time, scalability and latency. A low-cost wireless ECG data acquisition system capable of acquiring up to 22 hours of continuous ECG data is described in [1]. The authors use Arduino
3 3 ICT Innovations 2019 Web Proceedings ISSN 1857-7288
Uno and Bluetooth serial communication to real-time ECG data transfer. Their system is portable and battery powered. Wand et al. [21] proposed a mobile-cloud computational solution for ECG telemonitoring. Their approach enhances the mobile medical monitoring in terms of accuracy, execution and energy efficiency. Pineda et al. [18] presented a portable system that enables the acquisition, processing, storage, and transmission of ECG signal. Their system can record ECG data over 120 hours in local mode, up to 10 days of ECG data could be stored in the phone in the cellular mode. For the remote signal visualization, the average delay of the packets was less than 1.73ms, with a mean power consump- tion of 0.48 w/h, using a battery of 2 A/h.
M ECG signals Process
Push ECG files ECG signals Process L Results
W ECG signals Send ECG files ECG files
Function call Get ECG files Results
Process
Fig. 1. The three architectural scenarios to conduct the testing
4 4 ICT Innovations 2019 Web Proceedings ISSN 1857-7288
3 Testing Methodology
This section presents the testing environments, test cases, and test data.
3.1 Test Goal
The test goal is to determine the most efficient scenario in terms of performance (execution time) and resource saving (battery) to execute an ECG-derived heart rate and respiratory rate algorithm. We implemented the ECG-derived heart rate and respiratory rate algorithm relying on the idea presented by Ding et al. [5]. The algorithm accepts two pa- rameters: a file path where the ECG signal is stored and the sampling frequency at which the signal is collected (this option is necessary since various biosensors work with different sampling rates). The pseudo code of the algorithm is presented in Fig. 2.
1: Input: signal, frequency 2: Output: HR, num respirations, RR 3: function H3R(signal[], frequency) 4: num of peaks = Pan Tompkins (signal, frequency) 5: seconds = round(length(signal)/freq) 6: HR = 1/(seconds/num of peaks) ∗ 60 7: for i =1tonumof peaks-1 do 8: k = kurtosis(signal[peak[i]:peak[i +1]) 9: smoothed k=smooth(k, rlowess) 10: RR num of peaks = findpeaks(smoothed k) 11: num respirations = length(RR num of peaks) 12: RR = 1/(seconds/breaths) ∗ 60
Fig. 2. A pseudo code of the ECG-derived heart rate and respiratory rate algorithm
3.2 Test Scenarios
Figure 1 presents the testing environment and the architecture of three scenarios, according to the location where the processing is realized.
– Local mobile-only scenario (M), where the algorithm is executed on the An- droid mobile device. The algorithm is implemented as a C source code by using the Android Native Development Kit (NDK). – Remote Linux server scenario (L) where a shell script is used to transfer the locally collected ECG signals stored on the mobile device to a Linux server to execute the algorithm on the server. In this scenario, the C algorithm source code is compiled and executed on the Linux machine. The shell script is executed on the Android phone by using an Android terminal application.
5 5 ICT Innovations 2019 Web Proceedings ISSN 1857-7288
– Remote MatLab⃝c web service scenario (W),wheretheprocessingisper- formed on the recently developed MatLab⃝c Mobile application for Android and connected it to a Windows server with MatLab⃝c installed. The locally collected ECG signals stored on the mobile device have been automatically synced to the server and accessed by the MatLab⃝c based version of the algorithm. In this scenario, we used the originally developed algorithm’s MatLab⃝c version.
3.3 Testing Environment
We use Samsung Galaxy Tab S2 - tablet with OS Android 6.0 (Marshmallow) as a testing environment for the experiments. This tablet has 3GB RAM, 64GBinter- nal memory, Octa-core (4x1.9 GHz & 4x1.3 GHz) processor and non-removable Li-Ion 5870 mAh battery. In the scenario M, the algorithm is executed on the tablet, with a specially developed Android application. The algorithm C source code is added to the Android application by compiling it into a native library using Android Devel- opment Studio that Gradle can package with the APK file. Then, in the Java code, the functions in the native library can be called through the Java Native Interface (JNI). Our application only makes a call to the algorithm function by providing two parameters: the ECG recording file path from the tablet storage and the frequency (sampling rate at which the ECG signal is collected). The scenario L is based on sending the ECG recordings to a remote machine in a local computer network for the further processing. We have created a shell script that connects to the remote server, transfers the recording files to a remote server for further processing and stores the file transfer time stamp. In order to calculate the CPU time used by each process, we sum up the User and Systime. The remote server is a Linux machine (OS-CentOS 7.2) with octa-core processor and 8GB RAM. In order to execute the shell script on the Android tablet,we used the Android Terminal Emulator application which provides access to the Android’s built-in Linux command line shell. The scenario W uses computation offload by sending ECG recordings to a remote machine for the further processing. In this scenario, we used the Android MatLab⃝c Mobile application [20] and connect it to a Windows server where MatLab⃝c software is being installed. The windows server (OS Windows 7) has a quad-core processor (1.80GHz) with 4GB RAM. We use Tonido Server personal cloud [4] which allows us to transfer the files from the tablet to the remote server. The ECG recordings stored on the tablet are automatically syncedto the server by the Android Tonido application [3]. Then, they are accessed by the MatLab⃝c based version of the algorithm running on the server. In this scenario, we used the originally developed algorithm’s MatLab⃝c version. Tablet battery consumption is evaluated by using the Android GSam Battery Monitor application [9]. The battery is fully charged at the beginning of each test scenario.
6 6 ICT Innovations 2019 Web Proceedings ISSN 1857-7288
3.4 Test Cases
To cover variable lengths and sources of ECG, we used three different databases and a total of 1297 ECG recordings divided in the following categories:
1. Short term measurements - small files. A total of 1191 ECG recordings with varying length from 10 to 90 seconds at a sampling rate of 125Hz are created by using the Cooking Hacks e-Health Sensor Platform. This platform allows Arduino and Raspberry Pi to interact with the shield and be used for bio- metric and medical applications. The ECG Sensor in this set is a single-lead ECG which uses three ECG electrodes. These basic leads acquire enough information for rhythm-monitoring, but are insufficient for determination of ST elevation (since there is no lead that gives information about the anterior wall) [11]. 2. Medium term measurements - medium file length. A total of 22 ECG record- ings with varying length from 300 to 800 seconds at 125Hz are created by using the Savvy Sensor Platform. SAVVY ECG is a portable medical device for continuous and accurate monitoring of heart rhythm. It is a comprehen- sive and reliable solution for patients with cardiovascular diseases, athletes, the elderly and for people with an active lifestyle who want to monitor their own health [19]. 3. Large term measurements - large file length. A total of 84 ECG recordings with varying length from 2860 to 5170 seconds at 360Hz are obtained from the Massachusetts General Hospital-Marquette Foundation Hemodynamic and Electrocardiographic Database (MGH/MF) Physionet database [23].
Our test cases include files of different size from all databases: starting from 4KB up to 52KB from the first database, 59KB up to 9,571KB from the second database, and 8655KB up to 17273KB from the third database. A total of 1.65 GB data have been processed in each experiment.
3.5 Test Data
The measured data in the experiment is the execution time and battery con- sumption. Further on, we calculate the average algorithm time efficiencyperfile size and total algorithm time efficiency for all files. In order to calculate the average algorithm time efficiency, we split thefiles into categories according to the file size and find the average algorithm perfor- mance behavior. The total algorithm time efficiency is the sum of the algorithm time performances for all 1297 files. The average algorithm time efficiency and total algorithm time efficiency are calculated separately for each testing scenario. The total execution time is defined by (1) where Ti is the execution time for each file i and n is the total number of files.
n Ttotal = ! Ti (1) i=1
7 7 ICT Innovations 2019 Web Proceedings ISSN 1857-7288
Tavg is average execution time for each unique file size in KB, defined by (2), where k is the number of files of equal file size s.
k T (s) T (s)="i=1 i (2) avg k
1
0.5
0
4 6 8 10 12 14 16 18 20 22 24 26 28 30 34 36 42 59 132 220 294 777 903 -0.5
-1
-1.5 Logaverage processing time -2
-2.5 File size (KB) M W L
-3
Fig. 3. Average execution times (small files category).
2.3
1.8
1.3
0.8
0.3 Logaverage processing time
-0.2 1246 1422 1657 2551 2825 2961 3284 4690 7478 7827 8655 M W L File size (KB) -0.7
Fig. 4. Average execution times (medium files category).
Note that the average times TavgM , TavgL and TavgW are calculated for each scenario M, L, and W correspondingly.
8 8 ICT Innovations 2019 Web Proceedings ISSN 1857-7288
2.3
1.8
1.3
0.8
0.3 Logaverage processing time
-0.2 9544 9570 10450 10541 10889 10894 11504 11556 12210 12898 12909 12914 13479 14244 14317 14318 14554 14710 14769 14871 15253 15398 15459 15746 16096 16441 16624 17273 M W L File size (KB) -0.7
Fig. 5. Average execution times (large files category).
We assume the scenario M is the referent one and compare the results ob- tained for L and W scenarios. The relative speedup RP is a ratio of average execution times for each pair of test scenarios (3) by calculating RPML and RPMW correspondingly.
TavgM (s) TavgM (s) RPML = RPMW = (3) TavgL(s) TavgW (s)
Additionally, we calculate the speedup RPLW by (4) to compare the scenarios L and W.
TavgL(s) RPLW = (4) TavgW (s) The total battery consumption is calculated for 1297 processed files per each of the M, L and W scenario. The total battery consumption is defined by (5) where Bi is execution time for each file i and n is the total number of files.
n Btotal = ! Bi (5) i=1
4 Experimental Results
This section presents the results obtained from the defined testing scenarios.
4.1 Total performances
A total of 1.65GB data is processed and the summary results are provided in Table 1.
9 9 ICT Innovations 2019 Web Proceedings ISSN 1857-7288
Table 1. Summary of the execution time and battery consumption.
Scenario Ttotal (sec) Btotal in (%) M 10551.06 91 W 7715.07 36 L 7524.89 30
The first column presents each scenario, the second shows total execution time per scenario, and the third represents the total battery consumption. The average execution time has been calculated for each of the distinct file sizes ranging from 4KB - 17MB. Analyzing the results, the scenario M is the worst case where the total time for processing the ECG files is 10551 seconds and 91% of the device battery is spent. Even though the processing times for scenarios W and L do not differ significantly (W performs 2.5% worse than L); however, the battery consumption at W case is ∼16% worse when compared to the L scenario. One can conclude that the ECG data transfer to a remote execution of the algorithm on the Linux server to be the fastest and most efficient scenarioin terms of battery consumption.
4.2 Dependence on file size
We have inspected the execution times in all three scenarios considering the size variability of the ECG files. To represent the very small execution times versus the huge values, we use a 10-base logarithmic normalization of the average execution times per distinct ECG file size. Considering the file sizes available for our research, we can clearly distinguish three categories which we refer to: a small files category containing files less than 1MB in size, a medium files category with sizes ≥ 1MB and ≤ 9MB, and a large files category with files ≥ 10MB. Figures 3, 4 and 5 present the normalized average execution times (y-axis) for each file size in each category (x-axis) correspondingly for the M, L and W scenarios. The figure clearly shows that the M scenario performance follows a logarith- mic increasing trend. Although it performs extremely better for filesizesupto 5MB, it is worse for the larger files sizes. Considering the L and W scenarios, it can be perceived that they both follow the same performance trends propor- tional to the rise of the file size. Even though the scenario M is ∼1.3 times faster than the scenarios W and L for files up to 5MB, it is ∼2 times slower for large files and considering the absolute values of the difference in times to be mea- sured in hundreds and thousands of seconds. The scenario M is not acceptable for unlimited ECG files sizes in terms of processing time.
10 10 ICT Innovations 2019 Web Proceedings ISSN 1857-7288
1
4 6 8 -1 10 12 14 16 18 20 22 24 26 28 30 34 36 42 59 132 220 294 777 903
-3
-5 Logspeedup -7
-9 File size (KB) R(M,W) R(M,L) R(L,W) -11
Fig. 6. Relative speedup (small files category).
2
1.5
1
0.5
0
-0.5 1246 1422 1657 2551 2825 2961 3284 4690 7478 7827 8655 Logspeedup -1
-1.5
-2 File size (KB) R(M,W) R(M,L) R(L,W) -2.5
Fig. 7. Relative speedup (medium files category).
2
1.5
1
0.5
0
-0.5 9544 9570 10450 10541 10889 10894 11504 11556 12210 12898 12909 12914 13479 14244 14317 14318 14554 14710 14769 14871 15253 15398 15459 15746 16096 16441 16624 17273 Logspeedup -1
-1.5
-2 File size (KB) R(M,W) R(M,L) R(L,W) -2.5
Fig. 8. Relative speedup (large files category).
5 Discussion 5.1 Speedup analysis Figures 6, 7 and 8 present the relative speedup for all pairs of scenarios corre- spondingly. To represent the high range of the values we also use a logarithm scale with base 10.
11 11 ICT Innovations 2019 Web Proceedings ISSN 1857-7288
Table 2. Comparison of processing times considering chunks.
File size #chunksChunk size TM (sec) TW (sec) TL (sec) 5MB 1 5MB 25.78 25.32 24.18 5MB 2 2.5MB 17.98 36.80 35.14 5MB 5.000 1KB 0.31 2505.90 2393.20 1GB 1 1GB 80954.00 290.47 277.40 1GB 2 500MB 56466.00 422.18 403.18 1GB 200 5MB 5156.80 5063.70 4835.80 1GB 250 4MB 4592.10 5711.50 5454.40 1GB 1.000.000 1KB 61.66 501190.00 478630.00 60GB 1 60GB 40785000.00 1914.10 1828.00 60GB 2 30GB 28448000.00 2782.10 2656.90 60GB 12.000 5MB 309410.00 303820.00 290150.00 60GB 15.000 4MB 275530.00 342690.00 327270.00 60GB 60.000.000 1KB 3699.60 30071000.00 28718000.00
One can observe that there is a linear speedup of M vs W and M vs L for file sizes approximately up to 5MB large. Above this size value, the speedup is stabilized and non-increasing. Analyzing the relative speedup of L vs. Wsce- nario, we can see that W is never better than L and the value is approximately 0, except for small range where L scenario is better, leading to a conclusion that there is no significant difference between the execution times.
5.2 File size dependence
Table 2 shows the processing times for the three scenarios would take 40785000 (M), 1914 (W) and 1828 (L) seconds. The processing times if we have divided the file in 2 chunks of 30GB each, would decrease to 28448000 seconds in the M scenario. However, it would increase to 2782 and 2656 seconds in W and L scenarios correspondingly. Considering 5 MB to be the threshold at which the processing times for all the three scenarios are expected to be nearly equal, dividing a 60GB file in 12000 chunks of 5MB each has resulted in the expected outcome of nearly same processing times. In order to prove that if the chunk size is less than 5MB we have split the large file into 15000 chunks of 4MB each and obtained fastest processing time for the M scenario. Even more, considering the smallest chunk size to be 1KB, the results also showed that the processing time is best in favor of the M scenario. Testing also for 1GB and 5MB file sizes we can conclude the following:
1. If there is no Internet connection available, files must be processed locally on the mobile device (scenario M) and should be split into chunks of the smallest size of ∼1KB; 2. If an Internet connection is available, files larger than 5MB should be sent to the remote web service (scenario L);
12 12 ICT Innovations 2019 Web Proceedings ISSN 1857-7288
3. If an Internet connection is available, files less than 5MB should be split into chunks of the smallest size of ∼1KB and processed locally on the mobile device (scenario M).
6 Conclusion and Future Work
In this paper, we have analyzed the performance of ECG processing on mobile devices, including smartphones, tablets or similar devices. The analysis of the performance includes response time, battery consumption and resilience analysis of an algorithm for ECG derived heart rate and respira- tory rate. To make relevant conclusions we have realized three experiments for processing ECG files with various lengths, each of them providing a different execution environment. In the first experiment, files were processed locally, on a portable device. In the second experiment, files were processed remotely using a web service, and in the third files were processed remotely on the MatLab ⃝c server. The target goal was to analyze how to choose a more performance-effective and battery-effective solution. The hypothesis that the mobile-only solution per- forms worse than offloading solutions is valid only for files larger than 5MB. However, if a large file is split in chunks, then the mobile-only solution performs better. If the files were split into chunks which size is less than 5MB thenprocessing them locally is also acceptable even for large files. This conclusion does not apply to battery consumption, and in these cases, one needs to offload the computation in order to preserve the battery. We have modeled the performance behavior and compared the experimental and theoretical results. Considering the actual measurements, the theoretical (predicted) response time results have shown to follow the trend of the real response times. The experiment results solve the customer’s dilemma which option provides better performance and is more battery-effective, when processing ECG files with various size. We will continue to realize the same and similar experiments with different mobile devices.
References
1. Ahamed, M.A., Asraf-Ul-Ahad, M., Sohag, M.H.A., Ahmad, M.: Development of low cost wireless ECG data acquisition system. In: Advances in Electrical Engi- neering (ICAEE), 2015 International Conference on. pp. 72–75. IEEE (2015) 2. Amiri, A.M., Mankodiya, K., et al.: m-: An efficient QRS dQRSetection algorithm for mobile health applications. In: E-health Networking, Application & Services (HealthCom), 2015 17th International Conference on. pp. 673–676. IEEE (2015) 3. CodeLathe Technologies Inc: Tonido file access share sync, https://play.google.com/store/apps/details?id=com.tonido.android
13 13 ICT Innovations 2019 Web Proceedings ISSN 1857-7288
4. CodeLathe Technologies Inc: Tonido server software, https://www.tonido.com/ 5. Ding, S., Zhu, X., Chen, W., Wei, D.: Derivation of respiratory signal from single- channel ECGs based on source statistics. International Journal of Bioelectromag- netism 6(2), 43–49 (2004) 6. Elgendi, M.: Fast QRS detection with an optimized knowledge-based method: Eval- uation on 11 standard ECG databases. PloS one 8(9), e73557 (2013) 7. Elgendi, M., Eskofier, B., Dokos, S., Abbott, D.: Revisiting QRS detection method- ologies for portable, wearable, battery-operated, and wireless ECG systems. PloS one 9(1), e84018 (2014) 8. Gia, T.N., Jiang, M., Rahmani, A.M., Westerlund, T., Liljeberg, P., Tenhunen, H.: Fog computing in healthcare internet of things: A case study on ECG feature ex- traction. In: Computer and Information Technology; Ubiquitous Computing and Communications; Dependable, Autonomic and Secure Computing; Pervasive In- telligence and Computing (CIT/IUCC/DASC/PICOM), 2015 IEEE International Conference on. pp. 356–363. IEEE (2015) 9. GSam Labs: Gsam battery monitor, https://play.google.com/store/apps/details?id =com.gsamlabs.bbm&hl=en 10. Gusev, M., Stojmenski, A., Guseva, A.: Ecgalert: A heart attack alerting system. In: International Conference on ICT Innovations. pp. 27–36. Springer (2017) 11. Hacks, C.: e-health sensor platform v2. 0 for arduino and raspberry pi [biomet- ric/medical applications] (2013) 12. Ibaida, A., Al-Shammary, D., Khalil, I.: Cloud enabled fractalbasedECGcom- pression in wireless body sensor networks. Future Generation Computer Systems 35,91–101(2014) 13. Lahiri, K., Mandal, A., Sinha, S., Mukherjee, A.: Wireless personal health ECG monitoring application. In: Wireless Personal Multimedia Communications (WPMC), 2014 International Symposium on. pp. 618–623. IEEE (2014) 14. Li, P., Jiang, H., Yang, W., Liu, M., Zhang, X., Hu, X., Pang, B., Yao, Z., Chen, H.: A 410-nw efficient QRS processor for mobile ECG monitoring in 0.18-µmcmos. In: Biomedical Circuits and Systems Conference (BioCAS), 2016 IEEE. pp. 14–17. IEEE (2016) 15. Mazomenos, E.B., Biswas, D., Acharyya, A., Chen, T., Maharatna, K., Rosen- garten, J., Morgan, J., Curzen, N.: A low-complexity ECG feature extraction al- gorithm for mobile healthcare applications. IEEE journal of biomedical and health informatics 17(2), 459–469 (2013) 16. Miettinen, A.P., Nurminen, J.K.: Energy efficiency of mobile clients in cloud com- puting. HotCloud 10,4–4(2010) 17. Par´ak, J., Havl´ık, J.: ECG signal processing and heart rate frequency detection methods. Proceedings of Technical Computing Prague 8,2011(2011) 18. Pineda-L´opez, F.M., Mart´ınez-Fern´andez, A., Rojo-Alvarez,´ J.L., Blanco-Velasco, M.: Design and optimization of an ECG/holter hybrid system for mobile systems based on dspic. In: Computing in Cardiology Conference (CinC), 2014. pp. 453– 456. IEEE (2014) 19. Saving: Monitor the activity of your heart, http://www.savvy.si 20. The MathWorks, Inc.: Matlab mobile, https://play.google.com/store/apps/details? id=com.mathworks.matlabmobile&hl=en 21. Wang, X., Gui, Q., Liu, B., Jin, Z., Chen, Y.: Enabling smart personalized health- care: a hybrid mobile-cloud approach for ECG telemonitoring. IEEE journal of biomedical and health informatics 18(3), 739–745 (2014)
14 14 ICT Innovations 2019 Web Proceedings ISSN 1857-7288
22. Wang, X., Zhu, Y., Ha, Y., Qiu, M., Huang, T.: An fpga-based cloud system for massive ECG data analysis. IEEE Transactions on Circuits and Systems II: Express Briefs (2016) 23. Welch, J., Ford, P., Teplick, R., Rubsamen, R.: The massachusetts general hospital-marquette foundation hemodynamic and electrocardiographic database– comprehensive collection of critical care waveforms. Clinical Monitoring 7(1), 96–97 (1991) 24. Zhou, S., Zhu, Y., Wang, C., Gu, X., Yin, J., Jiang, J., Rong, G.: An fpga-assisted cloud framework for massive ECG signal processing. In: Dependable,Autonomic and Secure Computing (DASC), 2014 IEEE 12th International Conference on. pp. 208–213. IEEE (2014)
15 15 ICT Innovations 2019 Web Proceedings ISSN 1857-7288
Public NTP Stratum 2 Server for Time Synchronization of Devices and Applications in Republic of North Macedonia
Bogdan Jeliskoski1, Veno Pachovski1, Dobre Blazevski1, Irena Stojmenovska1
1 University American College Skopje, III Makedonska brigada br. 60, 1000 Skopje, Republic of North Macedonia [email protected], {pachovski, dobre.blazevski, stojmenovska}@uacs.edu.mk
Abstract. With the development of new hardware, applications and Internet of Things (IoT) devices, accurate time synchronization is more than necessary. Having close geolocation Network Time Protocol (NTP) server for hardware devices or the applications needed to synchronize time, means accurate time synchronization, near the atomic time with fast server response. In this research paper, we explain creation of first public NTP Stratum 2 server in Republic of North Macedonia upon the latest VMware virtualization technology, which will serve the territory of Republic of North Macedonia and rest of Europe. NTP server will be satisfying all needed standards. Registration in official NTP worldwide register is already conducted and the server is available for all in the world. Keywords: network time synchronization NTP public virtualization VMware ESXi server networking
1. Introduction
The scope of the project is to create a stable and reliable public NTP Stratum 2 server for Republic of North Macedonia and rest of Europe. Currently there is no other NTP server dedicated for serving the region of Republic of North Macedonia. The main goal of this project would be better timing services to the clients, and to offload the utilization of the ISPs’ (Internet Service Providers) main links towards their global routing peers and main uplinks. Prior this project there was no other public NTP server available in the state, all personal computers, laptops, cellphones and other
16 16 ICT Innovations 2019 Web Proceedings ISSN 1857-7288
devices and applications that are using SNTP/NTP were forced to use external NTP servers outside the state. With local NTP server publicly available, all the NTP connections would terminate at our NTP server, which would drastically improve the global network utilization and provide better time service for all clients. The NTP v4 protocol as defined by RFC-5905-Specification for protocol and algorithms is being used in this project, which is backward compatible with version 3. Version 4 of NTP protocol overcomes very significant and important deficiencies in design, compared to Version 3 of NTP, correcting specific errors that could occur, and involving new features in extended NTP time signatures, encouraging the use of fluctuating dual data types during the synchronization procedure [1]. As a result, the time resolution is better than a nanosecond, and the resolution of the frequency is less than one nanosecond per second [1]. Additional improvements include a new clock discipline algorithm that is more responsive to the hardware's frequency of the system. Typical primary servers using modern machines are precise within a few dozen microseconds. Typical secondary servers and clients on fast local networks are within a few hundred microseconds with intervals of up to 1024 [1]. With NTP Version 4, servers and clients are precise with polling intervals of up to 36 hours. For the purpose of this project we will first address NTP protocols and their security and then we will briefly introduce hierarchy of the SNTP Servers. Then we will explain our project setup including hardware requirements, installation and configuration of VMware ESXi Hypervisor, installation and configuration of Fedora 30 Operating System and backup, Network and firewall setup, configuration of advanced DNS settings for ntpmk.net hostname, as well as results from the testing.
2. Versions of NTP Protocol
First NTP implementation started in 1970 with accuracy of several hundred milliseconds. The first specifications were released in official RFC 778, but with different name - Internet Clock Service, where delay measurements of any host could be done by sending ICS datagram. The official introduction was in RFC 958 in 1985, where packets seeing on the networks are described and basic calculations were involved [2-4]. The Version 1 of NTP protocol was released in 1988 in RFC 1059. In this version, a symmetric operation and client-server mode was introduced. Version 1 was built on the IP protocol and User Datagram Protocol (UDP), to provide a connectionless transport from source to destination. It was designed for maintaining accuracy and robustness [2, 5]. Symmetric-key authentication with DES-CBC was introduced in the Version 2 of NTP protocol. DES is a symmetric key algorithm, with key size of 64 bits. This mechanism must derive a raw key with 64-bit value, and this process is not responsible for parity or weak key checks. Additional checking for weak keys and rejecting if found would be performed. Requesting for new SA would be done accordingly [6]. This version of NTP was officially published by IETF in RFC 1119 [2]. This RFC is obsoleted by the next generation of NTP, Version 3, described in RFC 1605 [6, 7].
17 17 ICT Innovations 2019 Web Proceedings ISSN 1857-7288
Version 3 of the NTP protocol was published in 1992 in RFC 1305. In this version, introduction of esterror and maxerror formal correctness principles were specified and implemented, and revision of algorithms for optimized work combining the offsets of a number of peer timeservers was performed. In addition, broadcast mode was also implemented [2, 6]. A client sends an NTP request or message to one or more servers and replies are processed as received. The server will locally change addresses and ports and overwriting on some fields will be done in the message. Recalculation of checksum is done, and the message is returned to the client. With this message, a client can determine the server time and adjust the local clock. Information about calculation for choosing best servers and expected timekeeping accuracy and reliability are stored in the same message [2, 7]. New algorithm is presented in Version 3 of NTP to combine offsets of a number of peer time servers, modeled by the same one used by national standards laboratories for combining the weighted offsets from a number of standards clocks to construct a synthetic laboratory scale. This new algorithm is used for improving the accuracy, stability and reducing of errors, because of the asymmetric paths that exist in the Internet [7].
2.1 SNTP PROTOCOL VERSION 4
Version 4 is the latest version of the NTP protocol as defined in RFC 2030 named Simple Network Time Protocol (SNTP) Version 4 for IPv4 and OSI [8]. Protocol and algorithms are defined in RFC 5905. Reinterpretation of header fields is done in this version for support of IPv6 and OSI addressing model. In addition, optional anycast extensions are implemented for multicast service [8]. Main difference between Version 3 and Version 4 of NTP protocol is the interpretation of four octet reference identifier field, used for detection and avoid of synchronization loops [8]. SNTP Version 4 can work in unicast mode (point-to-point), anycast mode (multipoint to point) and multicast mode (point to multipoint) [8, 9]. In the Fig. 1, a common NTP format is presented, with all parameters.
Fig. 1. Common format of NTP packet [10]
18 18 ICT Innovations 2019 Web Proceedings ISSN 1857-7288
As reference clock, the NTP is using Coordinated Universal Time (UTC) as a distinct from Greenwich Mean Time (GMT). International Atomic Time (TAI) time standard is used by UTC, and based on the measurements of 1 seconds as 9,192,631,770 periods of the radiation emitted by a caesium-133 atom in the transition between the two hyperfine levels of its ground state, meaning NTP protocol must implement leap second adjustments from time to time [10, 11]. Transmission delay can be calculated as total time from transmission of the chosen pool to reception of the response, minus recorded time for the server, processing the pool and to generate a response: δ = (T4 – T1) – (T3 – T2) [10]
Client’s offset clock from the server clock can be estimated: θ = ½ [(T2 – T1) + (T3 – T4)] [10]
2.2 Security Analysis of NTP Protocol
In 2016, the NTP servers due to security exploit suffered large scaled attack. As of this, ISPs and IXPs have decided to block NTP traffic to the abused servers. If servers are not patched correctly, only 5.000 servers are needed to generate a massive scale Distributed Denial of Services (DDoS) attack with around 400 Gbps [12]. The methods used for DDoS attacks are not because of exploits of NTP protocols or NTP daemon, but mainly because of poor configuration, and not using proper firewall configuration. The NTP servers are main target for attackers because today as stated, every personal computer, laptop or cellphone have built NTP client. Time synchronization on systems is the biggest factor to further vulnerabilities that can compromise the server or the client [12]. There were cases were NTP was used to manipulate logs and changing time on computer systems for changing the events [12]. The following can be used as vulnerability to attack NTP protocol [13]: Client/Server, Symmetric Modes, Rate Management Provisions, Digest layer of the Message’s Symmetric Key Cryptography, Autokey Protocol and Modification of NTP Header. Because NTP operates with UDP, thus does not need protocol handshake nor record for this request, primary exploit are DDoS attacks. Upgrading to the latest 4.2.7 version of NTP daemon is highly recommended [13, 14]. However, routers and firewalls are key factor for securing and optimizing the NTP servers, as will be discussed later.
3. SNTP Servers
There are two time NTP servers, primary and secondary. A primary timeserver or Stratum 1 server receives and UTC time signal directly from authoritative clock source such as atomic clock or GPS signal source. A secondary timeserver or Stratum 2 server receives time signal from one or more upstream NTP servers, while distributing time signals to other servers or clients. The chain of Stratum servers continues on. Stratum 3 server receives time signal from Stratum 2. Stratum n server can peer with many stratum n-1 servers, to maintain reference clock signal [11]. Stratum hierarchy can be seen in Fig. 2.
19 19 ICT Innovations 2019 Web Proceedings ISSN 1857-7288
Fig. 2. Detailed hierarchy of the Stratum servers
4. NTPMK – Requirements and Setup In this section the overall requirements and setup will be presented. 4.1 Hardware reqiurements The hardware requirements are considered for the VMware Hypervisor, because every hardware component must be compliant with the software. Main server hardware used is described in the following Table 1:
TABLE I: Main hardware components for the NTP Server. Price Component Manufacture Complaint with VMware ESXi Motherboard ASUS M5A78L LE AMD 760G-780L / SB710 YES 50 EUR CPU AMD FX-8350 Eight-Core Processor YES 60 EUR RAM 16 GB 1600 MHz YES 60 EUR HDD/SDD 1 x 1TB Samsung HDD Pro 1 x 500 WD Blue YES 100 EUR Power Supply Cooler Master WAT Lite 700W YES 50 EUR Network cards 2 x Intel NIC Pro 1000 YES 50 EUR UPS Smart ONLINE 3 KVa AVR and OVR YES 150 EUR Router/Firewall Mikrotik RB2011 YES 100 EUR L2 Switch Cisco SG300 YES 180 EUR / / TOTAL: 1420 EUR
20 20 ICT Innovations 2019 Web Proceedings ISSN 1857-7288
4.2 VMWARE ESXI 6.7 U2 Hypervisor configuration
VMware ESXi 6.7 U2 Hypervisor was used for this project. The ESXi is bare metal operating system, designed specifically for handling virtual machines with high throughput and load. Core features of ESXi includes [15]: Memory over-commitment and de-duplication, scalability up to 64 logical processing cores, 256 virtual CPUs and 1 TB of RAM per host, memory scalability, shaping of network traffic, using VLANs, NIC Card teaming, command line esxcli. In Fig. 3, a brief core design of VMware ESXi can be seen, and on Fig. 4 an actual summary tab of VMware ESXi is shown.
Fig. 3. VMware ESXi core structure
Fig. 4. An actual screen of VMware ESXi 6.7 U2 used in this project
For better security there are two network cards on the ESXi infrastructure server. NIC0 is for the production network as shown in Fig. 5, where all VM servers have access to the main router, and NIC1 just for management purposes, as can be seen in Fig. 6.
21 21 ICT Innovations 2019 Web Proceedings ISSN 1857-7288
Fig. 5. An actual screen of production network in ESXi server
Fig. 6. An actual screen of management network in ESXi server
4.3 Fedora Server 30 and backup server installation and configuration
Installation of Fedora operating system was done through porting the image to USB stick and then installing. Acquiring the IP address and DNS setting are through router’s DHCP server. In the router there are statically assigned leases in DHCP server, and also static ARP leases as well. Having in mind security issues mentioned above, after the installation, the system must be checked for newest updates and security patches with the following commands in terminal:
sudo dnf upgrade –refresh sudo dnf update
For security reason, only SSH connection will be permit in the server, whereas standard port number TCP/22 will be changed to port number 2882. Cockpit [16], which is GUI management center, will be left by default, as integrated part of Fedora system, working on TCP port 9090. IPtables are also used as local firewall rules for the server.
Installation of the newest NTP daemon from repository was done. Configuration of NTP server is done through ntp.conf file in /etc directory, viewed by standard vi or nano editor in Linux. Below is the complete built configuration for the NTP server.
# Record the frequency of the system clock. driftfile /var/lib/ntp/drift # Permit time synchronization with our time source, but do not # permit the source to query or modify the service on this system. restrict default nomodify notrap nopeer noepeer noquery # Permit association with pool servers. restrict source nomodify notrap noepeer noquery
22 22 ICT Innovations 2019 Web Proceedings ISSN 1857-7288
# Permit all access over the loopback interface. This could # be tightened as well, but to do so would effect some of # the administrative functions. restrict 127.0.0.1 restrict ::1 # Hosts on local network are less restricted. restrict 10.10.10.0 mask 255.255.255.0 nomodify notrap restrict -4 default kod noquery nomodify notrap nopeer limited restrict -6 default kod noquery nomodify notrap nopeer limited # Use public servers from the pool.ntp.org project. server ntp.bsdbg.net iburst server ptbtime1.ptb.de iburst server zeit.fu-berlin.de iburst server ntp0.fau.de iburst # Reduce the maximum number of servers used from the pool. tos maxclock 5 # Enable public key cryptography. #crypto includefile /etc/ntp/crypto/pw # Key file containing the keys and key identifiers used when operating # with symmetric key cryptography. keys /etc/ntp/keys
As the NTPMK server is Stratum 2, four Stratum 1 servers are chosen for time synchronization. The servers are chosen to be with geolocation closer to the NTPMK, with best possible latency and reliability. For this purpose we the servers are located in Bulgaria and in Germany. Then, the NTP daemon automatically chooses with whom to sync, according to the best possible time offset, as can be seen in Fig. 7 by issuing ntpq -p command in the terminal, presented.
Fig. 7. Screen of ntpq -p command issued in NTPMK server
This means that users will always have best service at any time. In order to provide reliable operation, backup of critical files is crucial. Backup of all configurations to an external local server is done with cron job in Fedora server.
23 23 ICT Innovations 2019 Web Proceedings ISSN 1857-7288
With this command, backup of all-important folders is done in case of disaster. Cron job is configuration as described below:
0 4 * * 0 rsync -avzh --progress --delete -e ssh /etc /home /root /var/www [email protected]:/backup-ntp >/dev/null 2>&1 4.4 Network and firewall setup For router and firewall device, a Mikrotik RB2011 device is used, as can be seen in Fig. 8.
Fig. 8. An actual picture of Mikrotik RB2011 used as main router and firewall for NTP server.
The Mikrotik RB2011 uses RouterOS, an operating system developed by Mikrotik [17]. The RouterOS can be managed through terminal and via GUI application Winbox [18]. On the Fig. 9, a detailed diagram of the network setup is shown, where Mikrotik is placed in front of the VMware ESXi hypervisor and behind the main gateway. Separation of production and management network is also shown.
24 24 ICT Innovations 2019 Web Proceedings ISSN 1857-7288
Fig. 9. A detailed network diagram with all elements
Mikrotik Internet configuration is done with static local IP parameters, assigned from the main gateway. For DNS settings an OpenDNS servers are used for better security, developed and maintained by Cisco Division [19]. As can be seen from Fig. 9, separate L3/L2 bridge domain is configured for NTPMK server for separating production from management, meaning separate DHCP server with static assigned leases. Source NAT (SRCNAT) is used for outbound and destination NAT (DSTNAT) for inbound connections are used. Destination NAT is also used for port forwarding [20]. Fig. 10 shows the NAT settings for NTPMK server, whereas Fig. 11 shows the forward chain of firewall settings.
25 25 ICT Innovations 2019 Web Proceedings ISSN 1857-7288
Fig. 10. NAT settings in Mikrotik router for NTPMK
Fig. 11. Forward firewall chain settings in Mikrotik router for NTPMK
Regarding the security, firewall rules are specifically planned and designed for overall protection of the whole network. Some rules are logged for debugging and analytics reasons. Connection tracking is been modified for better utilization of router’s CPU and DDoS mitigation. Moreover, timing of the UPD timeout and UDP stream timeout is set to minimum, in order the UDP seasons to break down faster. This is very useful for offloading the router’s CPU and for mitigation of DDoS attacks.
4.5 Configuration of Advanced DNS Settings for ntpmk.net Hostname
Hostname for NTPMK sever is configured for using ntpmk.net domain as NTP server and http/https web site for information. Public IP address of the server is 78.157.31.21 and the domain is registered with Name Cheap [21], a hostname and DNS provider. For binding the domain name with the IP address, two “A” records are used in the DNS settings and CHAME record is used for positive SSL certificate of the web site, as can be seen on Fig. 12. With this, clients can use the domain name - ntpmk.net or the IP address in theirs NTP clients for time synchronization.
Fig. 12. Advanced DNS settings for the ntpmk.net domain, used for NTP synchronization and http/https web site for information
26 26 ICT Innovations 2019 Web Proceedings ISSN 1857-7288
4.6 Testing and Registration
Testing and comparison is performed against most popular NTP server, integrated in all Windows operating systems time.windows.com. Trace route and connection lag is been used against time.windows.com in comparison to ntpmk.net. Fig. 13 presents how traffic is going from source IP in Republic of North Macedonia to the destination NTP server of Microsoft, and Fig. 14 shows the traffic flow towards the NTPMK server. Open Visual Trace Route application is being used for this purpose [22] and standard ntpdate -q command in Linux systems [23]. Testing time.windows.com NTP server with Open Visual Trace Route application:
Fig. 13. Testing trace route against time.windows.com NTP server
Testing ntpmk.net NTP server with Open Visual Trace Route application:
Fig. 14. Testing trace route against ntpmk.net NTP server
The following is result obtained from testing time.windows.com NTP server with ntpdate -q command: [root@serv2 ~]# ntpdate -q time.windows.com server 51.140.127.197, stratum 2, offset -0.005221, delay 0.08374 22 May 22:17:20 ntpdate[12998]: adjust time server 51.140.127.197 offset -0.005221 sec
The following is result obtained from testing ntpmk.net NTP server with ntpdate -q command: [root@serv2 ~]# ntpdate -q ntpmk.net server 78.157.31.21, stratum 2, offset 0.000653, delay 0.02681 22 May 22:18:55 ntpdate[13015]: adjust time server 78.157.31.21 offset 0.000653 sec
27 27 ICT Innovations 2019 Web Proceedings ISSN 1857-7288
From the tests performed, it can be concluded that ntpmk.net provides much better timing offset against time.windows.com NTP server. Ntpmk.net NTP server provides unmatched results (see Fig. 14), because of the geolocation of the server. The route to NTP server time.windows.com goes over 46 hops and connection is being made over the Atlantic (see Fig. 13), whereas connection to ntpmk.net goes just over two hops terminating at the state. Registration was made at Network Time Foundation which is providing NTP public support services and hosting IETF NTP Working group [24]. As mentioned, NTPMK is Stratum 2 server and it is successfully being registered as secondary time server. With this, NTPMK become first official fully functional public NTP Stratum 2 server for Republic of North Macedonia.
5. CONCLUSION
The purpose of this research paper and practical project is creating a first free public NTP server for time synchronization in Republic of North Macedonia and rest of Europe. Before the realization of our project, all clients and applications were forced to use some other external NTP server, outside the state, thus having very high connection lag and low quality of service. After presenting basics on NTP version protocols, their vulnerabilities and SNTP servers’ structure, we presented our setup and requirements for this project. The server is made upon latest VMware ESXi hypervisor for greater scalability and reliability. The NTP server is successfully deployed in production and is registered in the official NTP register. With this, the clients using NTP services devices or application in Republic of North Macedonia and rest of Europe will have better services (timing offset), whereas ISPs’ main routing peering uplinks will be significantly offloaded. On the other hand, security issue will be on lower level. As future work, the NTMK server can be put in the official public pool of NTP servers, which would further benefit the cause, serving a larger number of clients. This means that anyone using this pool for time synchronization will get time synchronization from randomly chosen time servers using Round Robin algorithm dedicated for client’s geolocation region. Many Linux distribution and IoT makers are using pool.ntp.org as their default NTP server. The problem is solved with our NTPMK server. Now, a first publicly available server is serving our region, free for all to use, contributing to clients and ISP providers, providing them with cleaner traffic to handle.
28 28 ICT Innovations 2019 Web Proceedings ISSN 1857-7288
References
1. IETF, RFC-5905 (2010) Network Time Protocol Version 4: Protocol and Algorithms Specification. Available at https://tools.ietf.org/html/rfc5905 (last accessed in May 2, 2019) 2. NTP History, (2019) History of NTP. Available at http://www.ntp.org/ntpfaq/NTP-s- def-hist.htm (last accessed on May 4, 2019) 3. IETF, RFC-778 (1981) DCNET Internet Clock Service. Available at https://tools.ietf.org/html/rfc778 (last accessed on May 3, 2019) 4. IETF, RFC-958 (1985) Network Time Protocol (NTP). Available at https://tools.ietf.org/html/rfc958 (last accessed on May 2, 2019) 5. IETF, RFC-1059 (1988) Network Time Protocol (Version 1) Specification and Implementation. Available at https://tools.ietf.org/html/rfc1059 (last accessed on May 2, 2019) 6. IETF, RFC-2405 (1998) The ESP DES-CBC Cipher Algorithm with Explicit IV. Available at https://tools.ietf.org/html/rfc2405 (last accessed on May 2, 2019) 7. IETF, RFC-1305 (1992) Network Time Protocol (Version 3) Specification, Implementation and Analysis. Available at https://www.ietf.org/rfc/rfc1305.txt (last accessed on July 27, 2018) 8. IETF, RFC-2030 (1996) Simple Network Time Protocol (SNTP) Version 4 for IPv4, IPv6 and OSI. Available at https://tools.ietf.org/html/rfc2030 (last accessed on May 1, 2019) 9. IANA, (2019) IPv4 Multicast Address Space Registry. Available at https://www.iana.org/assignments/multicast-addresses/multicast-addresses.xhtml (last accessed on May 8, 2019)
10. Cisco Press, Internet Protocol Journal, Internet Protocol Journal, Volume 15, No. 4 (2012) Protocol Basics: The Network Time Protocol. Available at https://www.cisco.com/c/en/us/about/press/internet-protocol-journal/back- issues/table-contents-58/154-ntp.html (last accessed on July 1, 2017) 11. Time and Date, (2012) International Atomic Time. Available at https://www.timeanddate.com/time/international-atomic-time.html (last accessed on October 22, 2017) 12. Insights, (2017) Best Practices for NTP Services. Available at https://insights.sei.cmu.edu/sei_blog/2017/04/best-practices-for-ntp-services.html (last accessed on May 14, 2019) 13. EECIS UDEL, (2019) NTP Security Analysis. Available at https://www.eecis.udel.edu/~mills/security.html (last accessed on May 3, 2019) 14. Infosecinstitute, (2014) Network Time Protocols, NTP Threads and Countermeasures. Available at https://resources.infosecinstitute.com/network-time-protocol-ntp-threats- countermeasures/#gref (last accessed on June 21, 2017) 15. Techtarget-Search Virtualization (2011) VMware ESXi Features. Available at https://searchservervirtualization.techtarget.com/tip/Understanding-VMware-ESXi- features (last accessed on July 29, 2019) 16. Cockpit Project, (2013) Fedora Cockpit. Available at https://cockpit-project.org/ (last accessed on May 8, 2019) 17. Mikrotik Software, (2014) RouterOS. Available at https://mikrotik.com/software (last accessed on April 9, 2019) 18. Mikrotik Winbox, (2014) Winbox. Available at https://mikrotik.com/download (last accessed on May 16, 2019)
29 29 ICT Innovations 2019 Web Proceedings ISSN 1857-7288
19. Cisco OpenDNS, (2015) OpenDNS. Available at https://www.opendns.com/ (last accessed on May 21, 2019) 20. Mikrotik NAT, (2010) Manual:IP/Firewall/NAT. Available at https://wiki.mikrotik.com/wiki/Manual:IP/Firewall/NAT (last accessed on February 21, 2019) 21. Name Cheap Hosting, (2000) Name Cheap. Available at https://www.namecheap.com/ (last accessed on May 22, 2019) 22. Visual Trace Route, (2016) Open Visual Trace Route. Available at https://www.visualtraceroute.net/ (last accessed on November 22, 2018) 23. IBM Knowledge Center, (2012) Ntpdate Command. Available at https://www.ibm.com/support/knowledgecenter/en/ssw_aix_71/com.ibm.aix.cmds4/n tpdate.htm (last accessed on May 22, 2019) 24. NTP Network Time Foundation, (2007) Network Time Foundation’s NTP. Available at http://support.ntp.org/bin/view/Main/WebHome (last accessed on May 20, 2019)
30 30 ICT Innovations 2019 Web Proceedings ISSN 1857-7288
Risk analysis and risk management in IT industry
Aleksandar Kovachev and Vesna Dimitrova
University “Ss. Cyril and Methodius”, Faculty of Computer Science & Engineering Skopje, Macedonia [email protected], [email protected]
Abstract. The more our business relies on IT, the more important it is to identify and control the risks that could affect the IT systems. Threats ranging from equip- ment failure to malicious attacks by hackers have the potential to disrupt critical business systems or open up access to any confidential data. This paper explains what is IT risk and the different types of IT risk that can affect us. It outlines how to carry out an IT risk assessment, and how to develop an effective IT risk man- agement process. It also explains the IT security management standards and pol- icies, and the interconnected role of IT risks and business continuity, because there are many standards for IT security as ISO 27000 and many others as ISO 12207 for system engineering that include some risk management issues or some standards for risk management like ISO 31000. Finally, it gives us a simple IT risk management checklist to help us assess and address any technology risks in the business.
Keywords: Risk Management × Treat × Risk Types × Risk Categories × IT Indus- try × Incident Management × Risk Standards.
1 Introduction
Our economy, society and individual lives have been transformed by digital technologies. They have enabled improvements in science, logistics, finance, communications and a whole range of other essential activities. As a consequence of this, we have come to depend on digital technologies, and this leads to very high expectations of how reliable these technologies will be. There are many standards for IT security as ISO 27000 and many others as ISO 12207 for system engineering that include some risk management issues or some standards for risk management like ISO 31000that can be followed [6] [7] [5]. Every organization has to make difficult decisions around how much time and money to spend protecting their technology and services; one of the main goals of risk management is to inform and improve these decisions. People have had to deal with dangers throughout history, but it’s only relatively recently that they’ve been able do so in a way that systematically anticipates and aspires to control risk. This paper is based on research and survey method of an existing and proven works and it can be found useful to: ─ Identify the common risks associated with a small business; ─ Identify the external and internal factors which affect risk for a small business;
31 31 ICT Innovations 2019 Web Proceedings ISSN 1857-7288
─ Identify situations that may cause risk for a small business; ─ Identify the common warning signs of risk for a small business; ─ Implement, monitor, and evaluate a risk management plan for a small business. Risk management is the process that allows IT managers to balance the operational and economic costs of protective measures and achieve gains in mission capability by protecting the IT systems and data that support their organizations’ missions. This pro- cess is not unique to the IT environment; indeed, it pervades decision-making in all areas of our daily lives [11]. There are many works developed related to this field of research. Some of them are emphasizes the management of the risks, some of them are analyzing the nature of the risks, and some of them are developing a risk management tools for bigger project suc- cess. In [12] authors present the revised risk framework for analyzing IT outsourcing and the implications of these findings in practice. In [1] authors are indicating new directions for research in the relation between risk management and project success and they are presenting the key elements, which are, stakeholder perception of risk and suc- cess and stakeholder behavior in the risk management process. In [8] authors are dis- cussing approaches to risk analysis appropriate for IT and they are presenting some suggested tools for risk analysis and management. In the content below, in Section 2 will be described the different IT risk categories and types, in Section 3 will be discussed about the impact of IT risk management failure to business. Further, in in Section 4 will be discussed about the meaning of IT risk management, what are the steps of IT risk management, what are the risk elements and controls. It will be discussed about the IT risk assessment methodology and the types of the same and in the Section 5 will be presented the IT risk management checklist for more effective management of the IT risks in the business. And at the end of the paper there will follow a conclusion and list of references and used literature.
2 Risk in IT Industry
Information technology or IT risk is any threat to the business data, critical systems and business processes. It is the risk associated with the use, ownership, operation, involve- ment, influence and adoption of IT within an organization. IT risks have the potential to damage business value and often come from poor man- agement of processes and events.
2.1 Categories of IT Risks
IT risk spans a range of business-critical areas, such as: ─ security – ex. compromised business data due to unauthorized access or use; ─ availability – ex. inability to access the IT systems needed for business opera- tions; ─ performance – ex. reduced productivity due to slow or delayed access to IT sys- tems;
32 32 ICT Innovations 2019 Web Proceedings ISSN 1857-7288
─ compliance – ex. failure to follow laws and regulations (ex. data protection). IT risks vary in range and nature. It's important to be aware of all the different types of IT risk potentially affecting our business.
2.2 Different Types of IT Risk
The IT systems and the information that the company hold on them face a wide range of risks. If the business relies on technology for key operations and activities, we need to be aware of the range and nature of those threats. Threats to the IT systems can be external, internal, deliberate and unintentional. Most IT risks affect one or more of the following: ─ Business or project goals – ex. Failure to deliver the project on time; ─ Service continuity – ex. Inability to continue with further development of new functionalities; ─ Bottom line results – ex. Business not advancing because of continuous bad per- formance; ─ Business reputation – ex. Bad reputation gained by bad management decisions regarding the project risks; ─ Security – ex. Security breach in IT sector; ─ Infrastructure – ex. Failure of company technical infrastructure.
Looking at the nature of risks, it is possible to differentiate between: ─ Physical threats - resulting from physical access or damage to IT resources such as the servers. These could include theft, damage from fire or flood, or unauthor- ized access to confidential data by an employee or outsider; ─ Electronic threats - aiming to compromise the business information – ex. a hacker could get access to the website, the IT system could become infected by a computer virus, or we could fall victim to a fraudulent email or website. These are commonly of a criminal nature; ─ Technical failures - such as software bugs, a computer crash or the complete failure of a computer component. A technical failure can be catastrophic if, for example, we cannot retrieve data on a failed hard drive and no backup copy is available; ─ Infrastructure failures - such as the loss of the internet connection can interrupt the business; ─ Human error - is a major threat - ex someone might accidentally delete important data, or fail to follow security procedures properly.
3 Potential Impact of IT Failure in Business
For businesses that rely on technology, events or incidents that compromise IT can cause many problems. For example, a security breach can lead to:
33 33 ICT Innovations 2019 Web Proceedings ISSN 1857-7288
─ identity fraud and theft; ─ financial fraud or theft; ─ damage to reputation; ─ damage to brand; ─ damage to the business physical assets.
Failure of IT systems due to downtime or outages can result in other damaging and diverse consequences, such as: ─ lost sales and customers; ─ reduced staff or business productivity; ─ reduced customer loyalty and satisfaction; ─ damaged relationship with partners and suppliers.
If IT failure affects our ability to comply with laws and regulations, then it could also lead to: ─ breach of legal duties; ─ breach of client confidentiality; ─ penalties, fines and litigation; ─ reputational damage.
If technology is enabling the connection to customers, suppliers, partners and busi- ness information, managing IT risks in the business should always be a core concern. We can assess and measure IT risks in different ways. To find out how to carry out an IT risk assessment and learn more about IT risk management process [4].
4 IT Risk Management
In business, IT risk management entails a process of identifying, monitoring and managing potential information security or technology risks with the goal of mitigating or minimizing their negative impact. Examples of potential IT risks include security breaches, data loss or theft, cyber- attacks, system failures and natural disasters.
4.1 Why Risk Management Matters
Risk management exists to help us to create plans for the future in a deliberate, respon- sible and ethical manner. This requires risk managers to explore what could go right or wrong in an organization, a project or a service, and recognizing that we can never fully know the future as we try to improve our prospects. Risk management is often perceived as a technocratic and dull profession. Risk man- agement is about analyzing our options and their future consequences, and presenting that information in an understandable, usable form to improve decision making [10].
34 34 ICT Innovations 2019 Web Proceedings ISSN 1857-7288
4.2 Steps in the IT Risk Management Processes To manage IT risks effectively, we need to follow these six steps in the risk manage- ment process:
• Identify risks - determine the nature of risks and how they relate to the business. To take a look at the different types of IT risk; • Assess risks - determine how serious each risk is to the business and prioritize them. To carry out an IT risk assessment; • Mitigate risks - put in place preventive measures to reduce the likelihood of the risk occurring and limit its impact. To find solutions in our IT risk management checklist; • Develop incident response - set out plans for managing a problem and recovering the operations. Form the IT incident response and recovery strategy; • Develop contingency plans - ensure that the business can continue to run after an incident or a crisis; • Review processes and procedures - continue to assess threats and manage new risks.
4.3 Elements of Risk Once we have identified our scope, most component-driven approaches require the risk analyst to assess three elements of risk. These three elements are typically described by the terms impact, vulnerability and threat. Risk is the intersection of impact, threats, and vulnerabilities. In order for a risk to be realized, a force (threat) overcomes re- sistance (vulnerability) causing impact (bad things). We are familiar with measuring forces and resistances (resistance is a force in the opposite direction) which is why we see another abused formula: Risk = Likelihood * Impact. Because threat and vulnera- bility are both a force and may be easily combined into this new “likelihood” [9]. Impact. Impact is the consequences of a risk being realized. When conducting com- ponent-driven risk assessments, impact is usually described in terms of the conse- quences of a given asset being compromised in a given way. This impact is described in different ways, but one of the more common techniques is to assess the impacts to confidentiality, integrity and availability, of information. For example, an impact might be described in terms of a loss of confidentiality of a customer dataset, or the corruption (loss of integrity) of the company's accounts. A loss of one of these properties can be connected to other types of consequence, such as lost money, loss of life, delays to projects, or any other kind of undesirable outcome. Vulnerability. A vulnerability is a weakness in a component that would enable an impact to be realized, either deliberately, or by accident. For instance, a vulnerability could be a piece of software which allows a user to illegitimately increase their user account privileges, or a weakness in a business process (e.g. not properly checking the identity of someone before issuing them credentials for access to an online system or service). Regardless of the type of vulnerability in question, it is something which can be used to cause an impact.
35 35 ICT Innovations 2019 Web Proceedings ISSN 1857-7288
Treat. Threat is the individual, group or circumstance which causes a given impact to occur. For example, this could be: ─ a lone hacker, or a state-sponsored group; ─ a member of staff who has made an honest mistake; ─ a situation beyond the control of the organization (such as high-impact weather).
The purpose of assessing threat is to improve the assessment of how likely a given risk is to be realized. Typically, when assessing a human threat actor, analysts consider people who are likely to want to harm the organization. This then allows them to con- sider both the capability and intent of these groups. Risk experts use threat taxonomies and categorization methods to provide a common language of threat capability. One of the weaknesses of threat assessments is that a threat's capability can change rapidly, and reliable information on these changes can be hard to come by. Because of this, don't treat the assessments of threat capability as if they are static, and make sure to recognize uncertainty and variability in the threat assessments.
4.4 IT Risk Controls
As part of the risk management, we need to try to reduce the likelihood of risks affecting the business in the first place. And to put in place measures to protect our systems and data from all known threats. Regarding this, we should: ─ Review the information we hold and share. Make sure that we comply with data protection legislation and think about what needs to be on public or shared sys- tems. Where possible, remove sensitive information; ─ Install and maintain security controls, such as firewalls, anti-virus software and processes that help prevent intrusion. See how to protect our business online; ─ Implement security policies and procedures such as internet and email usage pol- icies, and train staff. Follow best practice in cyber security for business; ─ Use a third-party IT provider if it lacks in-house skills. Often, they can provide its own security expertise. See how to choose an IT supplier for our business.
If we can't remove or reduce risks to an acceptable level, we may be able to take action to lessen the impact of potential incidents. We should consider: ─ setting procedures for detecting problems (ex. a virus infecting the system); ─ getting insurance against the costs of security breaches.
4.5 IT Risk Assessment Methodology
When assessing the IT risks in the business, we should avoid spending too much time and money reducing risks that may pose little or no threat to the business. Instead, we should focus on the most serious risks, based on:
36 36 ICT Innovations 2019 Web Proceedings ISSN 1857-7288
─ The likelihood of the risk happening - the Probability of occurrence of an impact that affects the environment; ─ The cost or impact if it does happen - the Environmental impact if an event oc- curs.
This is known as the quantitative risk assessment.
Risk = Consequence x Likelihood (1)
The C × L matrix method therefore combines the scores from the qualitative or semi- quantitative ratings of consequence (levels of impact) and the likelihood (levels of prob- ability) that a specific consequence will occur (not just any consequence) to generate a risk score and risk rating. Essentially, the higher the probability of a "worse" effect occurring, the greater the level of risk. This C x L risk assessment process involves selecting the most appropriate combination of consequence and likelihood levels that fit the situation for a particular objective based upon the information available and the collective knowledge of the group (including stakeholders, academics, managers, in- dustry, researchers and technical staff) involved in the assessment process [9]. When making decisions about what are appropriate combinations of consequence and likelihood, if more than one combination of consequence and likelihood is consid- ered plausible, the combination with the highest risk score should be chosen (i.e. this is consistent with taking a precautionary approach) [2].
4.6 Types of IT Risk Assessment Methodologies
Generally speaking, there are two main types of risk assessment: ─ qualitative risk assessment; ─ quantitative risk assessment.
A quantitative assessment combines the probability of risk occurring and the costs of impact and recovery. For example, if we assess a risk to have a high probability of happening and a potential of high cost/impact to the business, we will deem it high risk. Those that are unlikely to happen or will cost little or nothing to remedy, will fall under the category of low risk. Quantitative measures of risk like this are only meaningful when we have good data. We may not always have the necessary historical data to work out probability and cost estimates on IT-related risks, since they can change very quickly. A more practical approach may be to use a qualitative assessment. This relies on using our judgement to decide if the probability of occurrence is high, medium or low. For example, we might classify as 'high probability' something that we expect to happen several times a year. We can do the same for cost/impact in whatever terms seem useful, for example: ─ low - would lose up to half an hour of production;
37 37 ICT Innovations 2019 Web Proceedings ISSN 1857-7288
─ medium - would cause complete shutdown for at least three days; ─ high - would cause irrevocable loss to the business.
We might then decide to rank risks according to the significance to the business. Once we establish what the risks are and how important they are to the business, we will be able to decide whether to accept the risks or manage them. Sometimes, it may be best to use a mixed approach to IT risk assessments, combin- ing elements of both quantitative and qualitative analysis. We can use the quantitative data to assess the value of assets and loss expectancy, but also involve people in our business to gain their expert insight. This may take time and effort, but it can also result in a greater understanding of the risks and better data than each method would provide alone.
4.7 Common Examples of Risk Management Regardless of a company's business model, industry or level of earnings, business risks must be identified as a strategic aspect of business planning. Once risks are identified, companies take the appropriate steps to manage them to protect their business assets. The most common types of risk management techniques include avoidance, mitigation, transfer, and acceptance. [3]
Avoidance of Risk. The easiest way for a business to manage its identified risk is to avoid it altogether. In its most common form, avoidance takes place when a business refuses to engage in activities known or perceived to carry a risk of any kind. For in- stance, a business could forgo purchasing a building for a new retail location, as the risk of the venue not generating enough revenue to cover the cost of the building is high.
Risk Mitigation. Businesses can also choose to manage risk through mitigation or re- duction. Mitigating business risk is meant to lessen any negative consequence or impact of specific, known risks, and is most often used when those risks are unavoidable. For example, software companies mitigate the risk of a new program not functioning cor- rectly by releasing the product in stages. The risk of capital waste can be reduced through this type of strategy, but a degree of risk remains.
Transfer of Risk. In some instances, businesses choose to transfer risk away from the organization. Risk transfer typically takes place by paying a premium to an insurance company in exchange for protection against substantial financial loss. For example, project insurance can be used to protect a company from the costs incurred when a project or other client related part is not performing well.
38 38 ICT Innovations 2019 Web Proceedings ISSN 1857-7288
Risk Acceptance. Risk management can also be implemented through the acceptance of risk. Companies retain a certain level of risk brought on by specific projects or ex- pansion if the anticipated profit generated from the activity is far greater than its poten- tial risk. For example, software companies often utilize risk retention or acceptance when developing a new product. The cost of research and development does not out- weigh the potential for revenue generated from the sale of the new product, so the risk is deemed acceptable.
5 IT Risk Management Checklist
Risk management can be relatively simple if we follow some basic principles. To manage the IT risks to the business effectively, we do the following: ─ Think about IT security from the start when plan or update the IT system. Discuss the needs and potential problems with the system users; ─ Actively look for IT risks that could affect the business. Identify their likelihood, costs and impact. Carry out a comprehensive IT risk assessment; ─ Think about the opportunity, capability and motivation behind potential attacks. Understand the reasons for cyber-attack; ─ Assess the seriousness of each IT risk and focus on those that are most significant. ─ Understand the relevant laws, legislations and industry guidelines, especially if we have to comply with the General Data Protection Regulation (GDPR); ─ Configure the PCs, servers, firewalls and other technical elements of the system. Keep software and hardware equipment up to date. Put in place other common cyber security measures and read about securing the wireless network; ─ Do not rely on just one technical control (ex. a password). Use two-factor authen- tication to guarantee user identity – ex. something we have (such as an ID card) and something we know (a PIN or password); ─ Develop data recovery and backup processes and consider daily back-ups to offsite locations; ─ Support technical controls with appropriate policies, procedures and training. Un- derstand the most common insider threats in cyber security; ─ Make sure that we have a business continuity plan. This should cover any serious IT risk that we cannot fully control. Regularly review and update our plan; ─ Establish an effective IT incident response and recovery measures, as well as a recording and management system. Simulate incidents to test and improve our incident planning, response and recovery; ─ Develop and follow specific IT policies and procedures, such as on email and internet use, and make sure our staff know what falls under acceptable use; ─ Consider certification to the IT security management standards for the business and trading partners.
39 39 ICT Innovations 2019 Web Proceedings ISSN 1857-7288
In any type of project planning, risk management is a necessary tool. Risk manage- ment identifies and prioritizes risks, measures how harmful they can be, and develops a plan to deal with risks that are a threat to the project. As we develop our risk management plan, including the risks and how they will be dealt with, a risk checklist should quickly tell us from past experience and forecasting if a risk area will evolve.
6 Conclusion
In most organizations and companies, the network itself will continually be expanded and updated, its components changed, and its software applications replaced or updated with newer versions. In addition, personnel changes will occur, and security policies are likely to change over time. These changes mean that new risks will surface and risks previously mitigated may again become a concern. Thus, the risk management process is ongoing and evolving. This paper analyzes the risks in the IT industry, how they are applied in today’s IT companies and emphasizes the risk management and need for an ongoing risk evalua- tion and assessment and the factors that will lead to a successful risk management pro- gram, also making a path for possible future practical researches in this field. In conclusion that the risk management is crucial for running a successful business and there should be a specific schedule for assessing and mitigating mission risks within the company, but the periodically performed process should also be flexible enough to allow changes where warranted, such as major changes to the IT system and processing environment due to changes resulting from policies and new technologies.
References
1. De Bakker, K., Boonstra, A., Wortmann, H.: Does risk management contribute to IT project success? A meta-analysis of empirical evidence. International Journal of Project Manage- ment 28(5), 493–503 (2010) 2. Food and Agriculture Organization of the United Nations: Qualitative Risk Analysis, http://www.fao.org/fishery/eaf-net/eaftool/eaf_tool_4/en 3. Horton, M.: Common Examples of Risk Management, Trading Skills & Essentials (2018) 4. Invest Northern Ireland: the official online channel for business advice and guidance in Northern Ireland, https://www.nibusinessinfo.co.uk/content/what-it-risk 5. ISO 31000 Risk management, https://www.iso.org/iso-31000-risk-management.html 6. ISO/IEC 27001 Information security management, https://www.iso.org/isoiec-27001-infor- mation-security.html 7. ISO/IEC/IEEE 12207:2017 Systems and software engineering — Software life cycle pro- cesses, https://www.iso.org/standard/63712.html 8. McGaughey Jr, R.E., Snyder, C.A., Carr, H.H.: Implementing information technology for competitive advantage: risk management issues. Information & Management 26(5), 273– 280 (1994)
40 40 ICT Innovations 2019 Web Proceedings ISSN 1857-7288
9. Policy-oriented marine Environmental Research for the Southern European Seas (Perseus),: United Nation supported organization, http://www.perseus-net.eu/site/content.php?ar- tid=2204 10. Risk management guidance from the National Cyber Security Center in UK, https://www.ncsc.gov.uk/collection/risk-management-collection?curPage=/collection/risk- management-collection/essential-topics/introduction-risk-management-cyber-security- guidance 11. Stoneburner, G., Goguen, A., Feringa, A.: Risk management guide for information technol- ogy systems. Special Publication pp. 800–300 (2002) 12. Willcocks, L.P., Lacity, M.C., Kern, T., Risk mitigation in IT outsourcing strategy revisited: longitudinal case research at LISA. The Journal of Strategic Information Systems 8(3), 285- 314 (1999)
41 41 ICT Innovations 2019 Web Proceedings ISSN 1857-7288
A Review-based Survey of Mobile Student Services
Sasho Gramatikov1, Aleksandar Spasov1, Ana Gjorgjevikj1, and Dimitar Trajanov1
Faculty of Computer Science and Engineering, Ss Cyril and Methodius University in Skopje, North Macedonia [email protected], [email protected], [email protected], [email protected]
Abstract. Mobile services tend to become ubiquitous. Following the trend of mobile applications becoming part of everyday life in all areas, there is a need for putting the mobile services in action for education purposes. This paper intent is to give an overview of the mobile student services that are currently being implemented and used by a list ofac- knowledged and proved world rank universities. Apart from the overview of their features, the paper analyses the users’ feedback in form of rat- ings, reviews and downloads from the mobile market applications and uses this data to measure the influence of their features on the over- all users’ satisfaction and highlight the directions of the future mobile services.
Keywords: Mobile student services · University mobile applications · Feedback analysis.
1 Introduction
Smartphones have managed to get their place in every branch of the peoples’ life – establishing phone calls, Internet browsing, instant messaging, business, working with different applications while on the go. Due to the increasing num- ber of features, processing power, storage capacity and battery endurance, their number has been rapidly increasing from day to day. Thus, the number of mobile internet users in 2018 is about 3.3 billions and it has a serious tendency to be- come even bigger in the upcoming years [17]. A considerate part of these users are young people. In 2018, 94% of the young people in US aged be-tween 18 and 29 owned and used a smartphone. 28% of them used their smartphones at home with no broadband connectivity [3]. Young population has become dependent on smartphones, and the major reasons for this are the online services and the easy Wi-Fi Internet access they get to provide these services [19]. This convenience results with over 77% Americans going online on daily basis [2]. Apart from the vast use of the smartphones in the everyday life activities, they also take part in the educational process. More than 44% of the young adults are accessing
42 42 ICT Innovations 2019 Web Proceedings ISSN 1857-7288
educational materials [4] which proves that mobile devices can be a great tool for improving the process of education [6]. Although the smartphones are widely accepted by the students, there are still obstacles that slow down the process of their deeper penetration in the educational process, such as distraction of students during classes [15], univer- sity regulations that prohibit their use in classrooms, need for regular battery charge, lack of faculty member cooperation and limited charging points, to name a few [8]. However, as the universities address these problems by suppressing the technical obstacles, accepting the importance of the smartphones and providing proper professional training to the educators, the smartphones are quickly be- coming incredible classroom resource [11]. The students are not alone intheuse of the smartphones. Significant part of the academic staff replaced their laptops by the smartphones, which they use not only for personal purposes, but also for sharing knowledge with the students and supervising their projects and student activities [7]. There are many examples where mobile phones are part of the educational process both in secondary and higher schools and prove to affect the process in a positive manner [13]. Apart from the educational process, smartphones impact their social life, physical activities and psychological well-being [15]. Universities around the world are using this advantage and have created mobile applications and services that can be used by their students to access the learning materials, to be up-to-date with the events and to collaborate in between [5, 22, 14]. Another crucial importance of these services is that they help the students, especially the international students [21] and those who move to new cities, to integrate into the new environment and be-come part of communities sharing similar ideas and interests [19]. By means of the smartphones, the students and the academic staff also become part of virtual communities which enables them be part of the everyday student life even if they are not present on the campus premises [16]. A mobile student service is an application based on mobile computing and context- aware application concepts that can provide more user-centric information and services to the students [24, 9]. Because of the numerous advantages of the mobile applications versus the web applications and the preference of the students to use native application ver- sus using the existing web applications on their smartphones [10], most of the universities that follow the technological advances move towards utilization of mobile applications as part of their educational process. “Because of the charac- teristics of mobility, mobile learning has become the main direction for the future development of higher education, and also drawn attention in the information construction of a new stage in universities” [18]. The advantage of using mobile devices in education is that they provide the students easy and quick access to educational materials and answers to their questions in a conformable environment, and at the same time, they still give them the opportunity to be in a social interaction with the other students or teachers [1, 16]. Therefore, many of the world recognized universitieshave invested in new mobile applications and use them to assist in the education
43 43 ICT Innovations 2019 Web Proceedings ISSN 1857-7288
process. Other universities use their existing information systems and create new mobile applications to access the existing services in a more convenient way [23].
The main purpose of the universities when selecting and designing the fea- tures of the mobile applications is that they meet the needs and requirements of the students from educational and social. According to the analysis conducted by the authors of [12], the most preferred characteristics of the mobile applica- tions is their availability and ease of use, while the entertaining features are at the least expected among students. Although the analysis considers thestudents satisfaction, it is based on a questionnaire among a geographically local group of students. In [20], a continental scale of analysis of features typical for the mobile application in the Australian universities is conducted, however, there is no in- formation about the student’s satisfaction. The goal of this research is to make a global feature-wise overview and comparison of the existing world universities mobile applications, taking into consideration the users’ satisfaction, aiming to find a correlation between certain features availability and users’ satisfaction. To our knowledge, no similar work that analyses both the features and the users’ satisfaction has been published to present date. In our work, we used the QS list as a starting point for choosing the world universities. Then, for eachuniversity, a search for mobile application was made in the Google Play Store and in the iTunes App Market. Afterwards, each representative application found for the university was installed and its features were analyzed. We conducted the same analysis in two different time periods: in the year 2016 and 2018. Our search on the application market resulted with a total of approximately 50 downloaded and installed mobile student applications. It is important to mention that one half of these universities are from the USA. The reason for such dominance of universities from USA is that this country occupies most of the top 100 positions in the university world ranking. The rest of the universities considered in this work are from other parts of the world, mostly from Europe, which gives a wider perspective of the use of the mobile applications in the world. For every installed application we collected the ratings and the re-views of the users and used them to evaluate their popularity and to pin point their advantages and disadvantages based on users descriptive opinion.
The focus of this paper is an overview of the services offered by the student mobile applications, their contribution to the overall rating of the application, as well as analysis of the users’ quality of experience when using theseservices.
The rest of the article is organized as follows: in Section 2 we give short overview of the different mobile application platforms considered in our work and the features of a set of downloaded mobile applications of some world top universities. Then, in Section 3, we analyze the results obtained from the users’ feedback in form of ratings and reviews. After the discussion in Section 4, we eventually give conclusion from our work.
44 44 ICT Innovations 2019 Web Proceedings ISSN 1857-7288
2 Overview of mobile applications platforms and services
The mobile student services used by the universities of our list are developed for different leading platforms on the market. Most of them, 47 out of 50, are built to support both Android and iOS mobile platforms. The rest of the universities support only iOS or Android mobile platform. The third place of the world known mobile platforms is taken by Blackberry, which is supported by only 7 Universities. The fourth place is taken by the Windows Phone mobile platform, supported by only 3 universities. According to the basic services implemented in these applications, they can be divided in two major groups: open-access services, accessible by all users and limited-access services permitted to authenticated users only.
2.1 Open-access services The open-access services allow all users to view and access basic information about the university, the educational process and diverse university services with- out the need for previous authentication. Table 1 lists the open-access services that we identified and their frequency of appearance in the mobile applications.
Table 1: Overview of open-access services Name Description Freq. News and events Preview of news published and events related to 45 the university Location Guides the students and the visitors within the 41 campus Course info List of courses and programs of the university 40 Library Access and preview of the books of the Univer- 33 sity’s library Public transport Eases the access to the university campus with 25 public transport Office hours Office hours of the lecturers 24 Emergency contacts Emergency phone numbers 22 Sport Information about sport activities 16 Communica- tion Communication between two or more users 16 Food List of restaurants within the campus and menus 14 Multimedia Sharing of multimedia content among the stu- 14 dents Admission Information about the students that have applied 10 or want to apply for study program Photo Gallery of photos provided by the university 9 Broadcast Real-time broadcast of the student radio and TV 5 Career Overview of active jobs, internships, scholarshisp 5 and volunteering announcements
45 45 ICT Innovations 2019 Web Proceedings ISSN 1857-7288
2.2 Limited access services The limited access services are features of the mobile applications available for authenticated users only. Compared to the open-access services, these services provide more information and details about the university and the authenticated user. According to our survey, these services are not very common in the mobile applications, which can be also seen in Table 2.
Table 2: Overview of limited access services Name Description Freq. Contact Information for the students registered in the uni- 16 versity database Grades Grades for enrolled courses 15 Schedule Schedule and location of classes for enrolled 14 courses News and events News and events related to the user 13 Exam Time and location of exams 13 Library Book reservation 8 Career Career opportunities based on the user’s curricu- 3 lum Assigned task Eases the access to the university campus with 2 public transport
3 User feedback analysis
One of the major objectives of our survey on the mobile student services is to evaluate the users’ experience and measure their satisfaction of using these services. In order to achieve this goal, we collected data from the official mobile applications markets and their discussion forums, as well as from the official web sites of the universities. The data of interest for our work was the rating of the users, the number of reviews and their content, the approximate number of downloads, the number of features they contain and the world ranking of the universities. Following the trend of mobile application, a considerate part of the world universities have already implemented a mobile solution for accessing their in- formation. According to the services covered, the most advanced application for unauthenticated users is available for the students of the Harvard University. This application covered 14 out of total number of 17 services. From the users’ perspective, this application has set the pattern of how a student application should look since its design and action workflow managed to reduce the user actions to get to the proper place and to see the desired information. In the category of applications for authenticated users, the Ashford University developed the most advanced mobile application. It covers 7 out of 9 services
46 46 ICT Innovations 2019 Web Proceedings ISSN 1857-7288
present in its category. Even though the two missing services are very handy for finding a contact from a colleague or a proper job, the Ashford University has implemented successfully the modules related to the enrolled courses and information and notification that are very basic and much needed for the students as target group that uses this application. The applications of the Oxford and the Harvard Universities are the ones with the most user-friendly features. The Oxford University application has an advanced method for finding the optimal route to reach the university using Google Maps and the user’s location. The Harvard University’s application pro- vides the easiest way to get and search through different sections. It allows more than one search criteria to be added as part of a search in every section. This feature allows consistency and easy accessibility in the different modules. On the scale from 1 to 5, the range of ratings in 2016 is from 2.1 to 4.6, and in 2018 from 2.6 to 4.7. From the information gathered from the application stores, the most downloaded application has 100.000 – 500.000 downloads and the least downloaded application has 1.000-5.000 downloads.
3.1 User ratings analysis
Our first goal was to find the influence of a single feature of the application on the overall application rating. Since the rating data we gathered referstotheuser satisfaction of the entire set of services implemented in the applications, we mea- sured the average rating of the applications that contain certain feature based on four different criteria. For that purpose, we defined four different measures of average rating of the applications depending on the presence of different services correlated to parameters like total number of implemented services, number of downloads and reviews. Let us assume that there are n mobile applications in the data set A = {A1,...,An}. Each application from the data set can have up to m services belonging to the service set S = {S1,...,Sm}. The presence of the service Sj in the application Ai is marked with the variable aij having value 1 if Ai implements the service or value 0, otherwise. Each application has di downloads, it is rated with grade gi and has ri reviews from the users. Based on this definition, the first measure for the influence of certain service Sf on the application’s rating is the average rating of all the applications that implement that service expressed as: n i=1 aijgi Rf = ! n (1) =1 aif !i The second measure for the influence of service Sf on the applications’ popular- ity is the average rating of all applications that implement the service weighted according to their total number of services. The weight of the rating gi of appli- cation i that implements service f,i.e.,aif = 1, is calculated as the normalized m number of different services i=1 aif aij that the application implements. Thus, the rating of the applications! with more services will have higher influence on the value of the average rating of the applications that implement the same service.
47 47 ICT Innovations 2019 Web Proceedings ISSN 1857-7288
The value of this average rating is expressed as:
n m i=1 j=1 aif aijgi Rf = ! n ! m (2) =1 =1 aif aij " !i !j For calculation of the third measure, we assume that each service j of the applica- tion i contributes with equal portion to its overall rating. Therefore, the weight of the rating of application i that implements the service f in the average rating is −1 m calculated as the normalized portion of that service =1 aif compared to #!j $ the total number of services. With such definition, given service of application with less services has higher influence on the rating of that application, com- pared to application with more services because its presence contributes more to the overall application’s rating. The expression for the average rating of the applications that implement service f is:
−1 n m =1 =1 aij aif gi !i #!j $ Rf = −1 (3) n m =1 =1 aij aif !i #!j $ The fourth measure of influence gives more credit to the applications that are not only more downloaded, but also stimulated the users that downloaded them to give feedback. The weight of the rating of each application i that implements service f is proportional to the normalized ratio of the number of reviews and the number of downloads ri/di. Hence, the average rating of the applications with service f is given as:
n ri i=1 di aijgi Rf = ! n ri (4) =1 aif % !i di Using these definitions and the information we obtained for the reviewed appli- cations of the set A, we calculated the four different measures for each possible feature of the set S. Then, we calculated the average value of these results and used it to sort the influence of each feature on the rating of the application in descending order. The results of these calculations for the two different time periods are shown in Fig. 1. The series rate, norm by features, norm by fea- ture presence and norm by reviews/downloads refer to the defined measures Rf , Rf , Rf and , Rf accordingly. The last series, named as average is calculated as" the average% of the four series. Fig. 1a shows that in 2016, the applications with the highest average user ratings implement the Pictures, Food, TV/Radio and Sports service. These services vary only slightly in the values of the average ratings and can be observed as a group of services that have significantly higher influence in the user satisfaction compared to the rest of the services. The later services have even influence that linearly decreases within the entire range, ex- cept the last two services, the Admission and the Office hours serviceswhichdo not follow the linear decreasing fashion and contribute less to the satisfaction of
48 48 ICT Innovations 2019 Web Proceedings ISSN 1857-7288
the users. One of the main reasons of the low popularity of the Office hoursser- vices is that, according to the users’ comments, the data is sometimes outdated or not present at all.
(a)
(b)
Fig. 1: Overview of average contributions of features to applications popularity in the year a) 2016 and b) 2018
Fig. 1b shows that, in 2018, the applications with the highest average user ratings implement the Sport, Multimedia, Pictures, and TV/Radio service. Com- pared to the year 2016, the Dining and food service is replaced by the Sport service in the top popular services. The distinguishing popularity of the services related to multimedia data is expected since we are living in a modern age where the social platforms have deeply penetrated in the everyday life of the general population, especially the young people, and the exchange of multimedia data has become a common practice. Another considerable result from the compar- ison is that the communication services in the year 2018 are among the least popular services, which can be explained with the fact that the current mes-
49 49 ICT Innovations 2019 Web Proceedings ISSN 1857-7288
(a)
‘ (b)
Fig. 2: Weighted rating of mobile applications for ascending order of world uni- versity ranking in the year a) 2016 and b) 2018
saging and social mobile applications are so dominant and well established [15], that the communication service of the universities’ application havelittleodds of becoming convenient tool for communication among the students. In Fig. 2, we show the ratings of the analyzed mobile applications in two different periods, organized in descending order based on their world ranking. As a reference, we used the QS world ranking list, which annually evaluates the universities based on their academic and employer reputation, number of citations, number of students and number of international students. The world best university has rank 1, and each following university in the figure is assigned value incremented by 1, although the rank difference is not necessarily 1. The figure shows only the order of these universities sorted by their rank. Some of the considered universities are not ranked in the QS world ranking, andthere- fore, we assume they have lower rank and add them as last items in the figure. Additionally, each university on the figure is represented with a circle with size proportional to the number of downloaded applications. Fig. 2a shows that in the year 2016, on macro level, the ratings of the applications follow a linear de- pendence on the university ranking. Their rating decreases as the world rank of
50 50 ICT Innovations 2019 Web Proceedings ISSN 1857-7288
the university decreases. However, if we cluster the universities in regions based on the regularity of the ratings relative to the decreasing trend-line, we can see that the first region of the quarter top rated universities (with real ranks from 1 to 16) consists of applications with relatively high user rating placed above the decreasing line, following its direction. It is also important to note that the top three world universities, not only have one of the best rated applications, but also offer their service to responsive users for longest period. Therefore, these ratings can be considered as one of the most objectively calculated values.The second region, formed by the middle range of considered universities, consists of applications with highest deviation of ratings relative to the trend-line. Com- pared to the other ratings, this region contains the worst rated applications, but also one of the best rated. The last region also contains regular values of rat- ings placed near the trend-line. Even though the region consists of applications from lowest ranked universities, as well as from not ranked universities, the ap- plications have in average higher rate than the better-ranked universities from the second region. The figure also shows that many of the ratings are based on higher number of reviews. Contrary to the previous results, Fig. 2b shows that in the year 2018, on macro level, the ratings of the applications do not depend on the university ranking, which can be seen by the almost parallel to the x-axis trend-line. If we cluster the universities in regions based on the regularity of the ratings relative to the trend-line, we can see that there are three regions, where the applica- tions from the first and the third regions are above or near the trend-line, while the second region (in the middle) consists of applications with high deviation of ratings relative to the trend-line. This observation leads to the conclusion that the students are equally demanding and critically oriented to the mobile applications, no matter what quality of education the university offers. In the analysis of mobile applications in 2018, fewer mobile applications were considered since some of the applications considered inf 2016 were not on the market, or they were on the market, but the latest ratings were outdated.
3.2 User reviews analysis
The feedback of the users given in their comments is another important part of the survey. The comments are a descriptive measure of the user ratings and the quality of their experience, emphasizing the pros and cons of the features offered in the applications. For that purpose, we collected the comments and reviews of the analyzed applications written in English in two different periods, before the year 2016 and before the year 2018, and used their content to create word clouds. In our analysis, we preprocessed the users’ feedback using standardproce- dures in the natural language processing. First, we converted all words into low- ercase. Then, we performed tokenization using the NLTK (Natural Languages Tool-Kit) which separates the words and the punctuation into string tokens. In the next pre-processing phase, we removed the so-called stop-words. We used the
51 51 ICT Innovations 2019 Web Proceedings ISSN 1857-7288
NLTK English stop-words list, which currently contains 153 stop-words, how- ever, we deliberately removed the words “no” and “not” from this list since they are essential for evaluation of the connotation of the comments. In the last phase, we performed stemming and lemmatization of the obtained token array which group and reduce inflected forms of a word, so they can be analyzed as single word. The stemming process uses heuristic that crops the ends of the words, while the lemmatization process uses a vocabulary and morphological analysis of the words to return their base form. We used the WordNet based lemmatizer from NLTK to perform the last operation. After measuring the frequency of appearance of each single word in the pre- processed comments, we created word-clouds. We deliberately omitted the word “app” since it is multiple times more frequent than any other word and gives no significance to the users’ quality of experience expression. The single-word word cloud generated from student service mobile applications comments collected up to 2016 are shown in Fig. 3.a. The size of a single word in the word cloud is proportional to its frequency of appearance in the pre-processed token array. According to the figure, the users have positive general experience since the most frequent word is “great”. There are also other positive words like “love”, “like” and “good”. However, many users share negative experience which can be con- cluded by the frequent presence of the explicit negations “not”, “no”, “can’t”. There are also implicit negative words which express malfunction or lack of up-to date information of the latest application version, such as “fix”, “update” and “need” or report a critical problem in a recent update. The word cloud from the comments collected between year 2016 and 2018 in Fig. 3.b demonstrates that students have much worse experience with the mobile applications intended for them compared to the earlier period. The most frequent words in the cloud are “work”, “not” and “crash” which, if combined, imply negative experience regarding the stability of the applications. Apart from the students experiencing problems and malfunctional features, there are satisfied users which considerate the mobile applications to be great and useful for obtaining information about the classes. Although the word clouds if Fig. 3 show the general connotation of the users’ experience, the overall opinion of the users cannot be precisely depicted since some of the words may appear in both positive and negative connotation. A typical example is the word “work” which students can use to complain that the application is not working properly or emphasize that the application works flawlessly. Other such frequent verbs are “open”, “access”, “show” and “read”. Therefore, we added another step in the preprocessing of the comments which pairs the consecutive words in so-called bigrams and measured their frequency of appearance. Based on the frequency, we created the bigram word clouds shown in Fig. 4. Fig. 4.a demonstrates that in the first period of analysis, the students consider that the university mobile application is great, they love it or just think that it is good and easy to use. Nevertheless, there are students which are not satisfied with the application offered by their university, and therefore, require that the issues are fixed, or new extra features are added. On the contrary, in
52 52 ICT Innovations 2019 Web Proceedings ISSN 1857-7288
(a) (b)
Fig. 3: Word cloud of single words for student mobile applications reviews in the year a) 2016 and b) 2018
Fig. 4.b, we can see that in the second period of analysis, many the students are not satisfied with the applications because some of them do not work, do not open, crash or do not accepts users’ credentials. Yet, there is portion of the comments that imply positive user experience with the application offered by their university. Based on the single word and bigram word clouds we can conclude that there is a trend of bigger criticism of the newer versions of the student applications. This fact coincides with the results obtained from the analysis of the users rat- ing based on the world universities ranking. Since the mobile applications have become an inevitable part of students’ everyday routines, their requirements for more reliable and feature-rich applications have grown. As the mobile devices become more complex and equipped with more diverse and sophisticated sen- sors, the expectations for more complex applications are a natural consequence. Moreover, the mobile application market offers so many free, feature-rich appli- cations for navigation, content sharing, text and voice communications, that the university mobile applications must maintain high quality of service to keep their users satisfied. The universities should pay special attention to the services that are unique for the educational process such as course information, office hours, course materials and campus navigation. Based on the analysis of the reviews of all university applications that were found on the Market, we summarize that the users from all universities appreciate the most the detailed information pro- vided by the mobile applications. They like the good design of the applications, their responsiveness and the overall user experience. In most of the applications the users are satisfied with the accurate course information and schedules. They especially appreciate the applications that embed the useful information, instead of links that redirect the user to external apps or browsers. Other features that users find very useful are the maps and the routes that navigate them around the university campuses. The users of the student mobile applications, in most
53 53 ICT Innovations 2019 Web Proceedings ISSN 1857-7288
(a) (b)
Fig. 4: Word cloud of bigrams for student mobile applications reviews in the year a) 2016 and b) 2018
of the cases, are irritated when the application crashes, feels buggy and unstable or when the provided information is inaccurate and not up to date. The reduced performance like the long waiting times for retrieval of the latest data from the Internet and the slow response due to the delay between switching contexts and modules in the applications are other reasons that make users unsatisfied. The users also dislike when the applications do not keep the user’s credentials, and hence, require frequent re-logging when accessed after certain period.
4 Desirable behavior of the mobile applications
Judging according to the contents of their reviews, the users of the analyzed mobile application would appreciate quick response with accurate information. They would like to receive notifications when important information, like sched- ules, grades or new materials are added or changed. Another feature required by the users is accessing the university mail from the mobile application that would provide attaching files from their mobile devices. Users also find very use- ful the trend of opening everything as part of the application, without using the default browser for accessing external links. As an overall overview of the applications, the users require consistent access provided by a stabile application that can run smoothly with fast switching between screens and contexts. The most important requirement is low memory utilization and uninterrupted access to all functionalities, especially the basic ones like search, maps, navigation and local public transport. Another important requirement by the users is the func- tionality where authenticated users can communicate and collaborate with other students and teachers.
54 54 ICT Innovations 2019 Web Proceedings ISSN 1857-7288
5 Conclusions
This paper presents a brief overview of the mobile student services and their basic functionalities. Therefore, a set of 50 world class universities were consid- ered and their mobile applications were examined. The applications were cate- gorized in to major groups based on whether they require authentication or not. The services of these applications were further categorized according their basic functionalities and analyzed. To quantitatively measure the influence of these features ratings, number of downloads and comments data was collected from the mobile application markets for the most common mobile operating systems for two different periods: before 2016 and before 2018. We defined four measures of the users’ satisfaction and used them to analyze the influence of the features of the applications on their overall rating. The results show that the features related to non-educational activities like sports, multimedia and dining enjoy highest popularity with a slight change of the rank in the different timeperiods. We also show that the overall popularity of the applications used to be related to the university ranking in the past years, while in the recent years, the univer- sity ranking has nothing to do with its mobile application attractiveness. The analysis of the user comments additionally shows that the students used to share more positive experience with the services offered by the applications, however, after their latest updates, students are considerably unsatisfied with their sta- bility and outdated information. They also criticize certain feature and require new updates where the issues are fixed. Nevertheless, there are still universities that offer great applications with features that make the life of the students eas- ier, especially those who are at the beginning of their studies. The results from this paper can serve as guideline for the university applications developers that will give the essential features for a state-of-the-art student mobile application, emphasizing the importance and influence of each feature and functionality for gaining satisfied users. This task is becoming a bigger challenge taking into con- sideration the fact that the smart mobile devices are becoming more powerful, and at the same time, new free applications are emerging on the market that of- fer so many features that become a main environment for information retrieval, communication and navigation.
References
1. Concordia online education, “the u.s. digtal consumer report”, https://www.nielsen.com/us/en/insights/reports/2014/the-us-digital-consumer- report.html 2. Pew research center, “about a quarter of u.s. adults say they are ‘almost con-stantly’ online”, http://www.pewresearch.org/fact-tank/2018/03/14/about-a- quarter-of-americans-report-going-online-almost-constantly 3. Pew research center, “mobile fact sheet”, http://www.pewinternet.org/fact- sheet/mobile/ 4. U.s. smartphone use in 2015, https://www.pewinternet.org/2015/04/01/us- smartphone-use-in-2015/
55 55 ICT Innovations 2019 Web Proceedings ISSN 1857-7288
5. Aakhus, M.A., Katz, J.E.: Perpetual contact: Mobile communication, private talk, public performance. Cambridge University Press (2002) 6. Aldrich, A.W.: Universities and libraries move to the mobile web. Educause Quar- terly 33(2), 5 (2010) 7. Alfawareh, H.M., Jusoh, S.: The use and effects of smartphones in higher education. International Journal of Interactive Mobile Technologies (iJIM) 11(6), 103–111 (2017) 8. Alwraikat, M.A.: Smartphones as a new paradigm in higher education overcoming obstacles. International Journal of Interactive Mobile Technologies (iJIM) 11(4), 114–135 (2017) 9. Asif, M., Krogstie, J.: Mobile student information system. Campus-Wide Informa- tion Systems 28(1), 5–15 (2011) 10. Bowen, K., Pistilli, M.D.: Student preferences for mobile app usage. Research Bulletin)(Louisville, CO: EDUCAUSE Center for Applied Research, forthcoming), available from http://www. educause. edu/ecar (2012) 11. Clayton, K., Murphy, A.: Smartphone apps in education: Students create videos to teach smartphone use as tool for learning. Journal of Media Literacy Education 8(2), 99–109 (2016) 12. Delialioglu, O.,¨ Alioon, Y.: Student preferences for m-learning application char- acteristics. International Association for Development of the Information Society (2014) 13. Ekanayake, S.Y., Wishart, J.: Mobile phone images and video in science teaching and learning. Learning, Media and Technology 39(2), 229–249 (2014) 14. Gikas, J., Grant, M.M.: Mobile computing devices in higher education: Student perspectives on learning with cellphones, smartphones & social media. The Internet and Higher Education 19,18–26(2013) 15. Jesse, G.R.: Smartphone and app usage among college students: Using smart- phones effectively for social and educational needs. In: Proceedings of the EDSIG Conference. p. n3424 (2015) 16. Kretovics, M.: Commuter students, online services, and online communities. New Directions for Student Services 2015(150), 69–78 (2015) 17. Mayuran Sivakumaran, P.I.: The mobile economy 2018. Tech. rep., Groupe Speciale Mobile Association (2018) 18. Miao, G.: The construction and development of university mobile learning system. In: 2013 6th International Conference on Information Management,Innovation Management and Industrial Engineering. vol. 2, pp. 588–590. IEEE (2013) 19. Mohd Suki, N.: Students’ dependence on smart phones: The influence of so- cial needs, social influences and convenience. Campus-Wide Information Systems 30(2), 124–134 (2013) 20. Pechenkina, E.: Developing a typology of mobile apps in higher education: A na- tional case-study. Australasian Journal of Educational Technology 33(4) (2017) 21. Roberts, P., Dunworth, K.: Staff and student perceptions of support services for international students in higher education: A case study. Journal of Higher Edu- cation Policy and Management 34(5), 517–528 (2012) 22. Rowley, J., Banwell, L., Childs, S., Gannon-Leary, P., Lonsdale, R., Urquhart, C., Armstrong, C.: User behaviour in relation to electronic information services within the uk higher education academic community. Journal of Educational Media 27(3), 107–122 (2002) 23. Shilpi Taneja, A.G.: A mobile app architecture for student information system. International Journal of Web Applications 7(2), 56–63 (2015)
56 56 ICT Innovations 2019 Web Proceedings ISSN 1857-7288
24. Wilson, S., McCarthy, G.: The mobile university: from the library to the campus. Reference Services Review 38(2), 214–232 (2010)
57 57 ICT Innovations 2019 Web Proceedings ISSN 1857-7288
Rating and Geo-Mapping of Travel Agencies that Provide Vacations to Long-Distance Destinations from Skopje, North Macedonia
Dina Gitova, Elena M. Jovanovska, Andreja Naumoski
Ss. Cyril and Methodius University in Skopje, Faculty of Computer Science and Engineering, Skopje, North Macedonia [email protected], [email protected], [email protected]
Abstract. Specialized niche services like providing offers to long-distance des- tinations, can be considered a determining strategy for travel agencies to ac- quire market share, differentiate themselves from competitors and to add higher value to the customer. The goal of this paper is to identify, rate and map the travel agencies in Skopje, North Macedonia that offer trips to long-distance des- tinations, to provide basic information about them and to derive a ranking sys- tem that would proportionally represent them accordingly to the user ratings and number of offers. Travel agencies that fulfill a pre-determined set of criteria are weighted according to user rating and the number of users that left a rating. Furthermore, to include travel agencies without user ratings, who provided the service, a ranking from the number of offers available was presented. Moreo- ver, the creation of a geographical database will provide the customers with a sorted list of useful information with regards to the expected service, visually presenting the aggregated data. The visual-spatial elaboration of the evaluation is represented with the software ArcGIS. Implementing the results of the re- search in other map applications, with a special focus on mobile technologies, can account for the transparent ranking of the travel agencies.
Keywords: Travel agencies ∙ Long-Distance Vacations ∙ ArcGIS.
1 Introduction
Travel agencies are continually repositioning the provided services to fit in the chang- ing tourism industry. The internet as a platform for sharing experiences has shifted the demand from mass-market oriented travel arrangements that contain standard packag- es, to individually customized niche services. The business model of the travel agen- cies must be adapted for the current demand for offers that include distinguished trav- el packages to long-distance destinations. Moreover, marketing strategies are influ- enced by the internet as a distribution channel and e-intermediary for interaction with the target customers and a wider audience. Furthermore, an updated web page serves the purpose of a new virtual office.
58 58 ICT Innovations 2019 Web Proceedings ISSN 1857-7288
Having understood the taste of the global traveler who is interested in unique travel experience, the goal of this paper is to represent the travel agencies in Skopje, North Macedonia that provide vacations to unusual destinations proportionally to the num- ber of offers as well as the user rating left on Google.com. According to the data from the State Statistical Office of North Macedonia, 51,99% of the population in the capital city Skopje has been on a vacation in 2016 [1]. To find the best vacation package, the travelers have at their disposal 190 travel agencies in the city of Skopje – however many of them offer arrangements only in the neighboring countries. Since the goal of this paper is to find out how many travel agencies in Skopje, provide specialized niche offers to long-travel destinations, the e- channel used for selection of the agencies, was the travel agency’s web-site, con- sistent with many studies that use website evaluation frameworks and criteria [2, 3]. Additionally, the customers use the website for travel planning and to seek reassur- ance from review sites. Moreover, a qualitative analysis from the available infor- mation was implemented i.e. a selection process framework of pre-determined criteria was developed to ensure the accuracy of the data found and to create a ranking sys- tem. This data can be used for travel pattern discovery and market potential analysis, like [4]. Furthermore, recommended systems can be built upon this data to further improve the analysis [5, 6]. The model was expanded using Geographic Information System (GIS) that satis- fies the need for making, handling and picturing geodata. With the aim of integrating tourism related information and adding a spatial component in interpreting the find- ings, as well as including different possibilities for implementation of the research, one of the most popular GIS software was used, ArcGIS by Esri. Additionally, geo- spatial visualization via this software was used in identifying trends that were propor- tionally represented. The research results will enable travelers to quickly and easily find travel agencies that offer trips to long-distance destinations together with all the relevant information regarding location and contact data such as e-mail, phone and the website. Enlisting and ranking of the travel agencies that offer long-distance trips will ease the search of the perfect vacation in a new destination. Moreover, the results of the research can be further developed to take part in mobile apps that are using maps to better represent various places in a city. Having introduced the research background, the remaining sections will provide a more detailed path of the created database with respect to the analyzed factors. After- ward, the connection between the database and the GIS Software will be elaborated, explaining the procedure in detail. Subsequently, the weighed modelling approach as a theoretical backbone of the model, together with the interpolation and the symbolo- gy of the modelling in the GIS software will account for the results of the study. In conclusion, the findings of the study together with the future possibilities of develop- ment and use of the research will be presented.
59 59 ICT Innovations 2019 Web Proceedings ISSN 1857-7288
2 Method and materials
2.1 Strategic Selection Framework
Stage 1: Data Preprocessing
1. Identification of all the travel agencies in Skopje 2. Analysis of data accuracy through website information 3. Selecting the agencies that provide travel arrangements to long-distance destina- tions
Stage 2: Data storage and management
1. Filling in the table with the required attributes that allow for the identification of travel agencies and the number of different offers to the previously described des- tinations 2. Weighing the data according to the user ratings
Stage 3: Data analysis with GIS 1. Interpolation and symbology of the data according to: 2. Number of different offers 3. User ratings
2.2 Data Management
The source of the data used in this paper is the website Zlatna Kniga [7]. In Skopje, according to this website, there are 190 travel agencies. The credibility of the provid- ed data is verified through finding the website of each travel agency, and once con- firmed, is followed by an analysis of the provided arrangements to long-distance des- tinations. In this phase, the travel agencies that do not have a website or do not have valid in- formation about the offered packages, are discarded, which led to 37 travel agencies that offer trips to long-distance destinations. For each travel agency, the following attributes are entered in table:
Name – the name of the travel agency. Address – address of the travel agency. Address comment – additional comment about the location of the travel agency. Phone - the travel agency's phone number. Working hours – weekday and weekend work hours. Destinations – all the long-distance destinations that the travel agency is offering. Email - the travel agency's email address. Web page – the travel agency’s website. Longitude - degrees of longitude in decimals. Latitude – degrees of latitude in decimals. Rating Score – rating grade from online users. Users – the number of users that rated the travel agency.
60 60 ICT Innovations 2019 Web Proceedings ISSN 1857-7288
Weighed – the calculated weight of the review score (weighed with 0.3) and number of users that rate the travel agency (weighed with 0.7).
2.3 Connecting the Map with the Data in ArcGIS
To create the visual representation of the travel agencies, the ArcMap tool, part of the ArcGIS software package, was used. For this project, as a base map, we used the geographical map of Skopje which was imported by selecting the option Imagery with labels. On the map of Skopje with the municipality labels, a layer of street names was added from ArcGIS Online, another product by Esri, to showcase the exact locations of the travel agencies. Regarding which information is currently needed, these layers can be turned on or off. We export the database from the table into the geodatabase and the map. The locations of the 37 agencies are displayed on the map using the longitude and latitude information (see Fig. 1). For an accurate representation of the locations on the map, the coordinate system WGS_1984_Web_Mercator_Auxiliary_Sphere is changed to WGS_1984.
Fig. 1. Representation of the travel agencies on the map, including labels with their names.
61 61 ICT Innovations 2019 Web Proceedings ISSN 1857-7288
Coordinates are taken from the website Map Coordinates [8] for the locations available on Google Maps, and the coordinates for travel agencies that are not listed on Google Maps have been manually entered from the same website. To ensure high- er visibility, the symbols can be changed to any desired icon with the option Symbol- ogy. The chosen icon in this layer where we display the locations of all travel agen- cies is an airplane, as a representation of the travel agencies. Additionally, the labels are added via the Labelling toolbar and the Maplex Label Engine. To change the style, size, font and placement of the labels, we use Label Manager. Because there are 37 agencies, we can see the overlapping of the multiple labels. To ensure higher visibility, the Scale Range can be set to not show the labels when zoomed out from the map with a scale higher than 1:12,500. More information needs to be provided to the users besides just the locations of the agencies and the labels. To avoid cluttering the visualization more, a hover function is added via Display expressions, which shows the name of the agency and all destina- tions that it offers, as it is shown with Fig. 1.
2.4 Weight Modeling Approach
The weight modelling approach was used to provide a rating that will balance the different number of users derived by weighing the user rating score with 0,3 and the number of users that gave that rating with a weight of 0,7. Consequently, the average rating of a higher number of users will outweigh the rating of the smaller sample. The formula used is:
� � �����_����� = � + � (1) max � 1 max � 2
where n, m represents the review score and the number of user reviews respectively and w1 and w2 represent the weights. However, for the travel agencies who don’t have a lot of user ratings, there is another interpolation (see below Interpolation and symbology) that will showcase the travel agencies appropriately to the number of offers presented on the website. For the interpolation in this research, we used two criteria: the number of destina- tions offered by travel agencies, and the total score from user reviews. Firstly, travel agencies that offer a higher number of choices are rated with the highest visibility to benefit the customer who values diversity. Moreover, this will enable the representa- tion of the travel agencies that have no user rating. Secondly, another interpolation was performed based on the factors: user rating (on a scale from 0 to 5, 0 being no rating, and 5 being the highest rating) and several users that rated the travel agencies. The data is weighed to enable evening out of the difference in sample size i.e. ensur- ing comparable data. Furthermore, this interpolation shows the customers which are the best places to visit according to user reviews. To ensure the representation of the results is within the borders of Skopje, a shapefile of North Macedonia is added to the project [9] and from the attribute’s table, all the municipalities except Skopje are erased. This creates a shapefile with only the borders of Skopje. The result is shown in Fig. 2.
62 62 ICT Innovations 2019 Web Proceedings ISSN 1857-7288
Fig. 2. Shapefile representing the borders of Skopje as well the travel agencies
This shapefile is set up as a coordinate system with the new data and is exported as a layer on the map. The interpolation analysis is done using the Interpolation option in the Geoprocessing menu and the number of classes and selection of color are setup in the Layer Properties menu. In the new layer that was created for displaying the first interpolation with respect to the number of destinations a travel agency provides, the symbology can be changed from simple points to another classification type. The symbols have been presented according to the number of destinations offered by the travel agency, i.e. the higher the number of destinations, the bigger the symbol. This provides better visualization for the end user. The number of classes and the interval, as well as the symbol are random, in the case of interpolation settings. The color combination chosen for representation is Purple to Green Diverging, Bright. In this layer, the hover function, as well as labels, are added to enable better recognition. For the second interpolation, a rating of the travel agencies and the number of users who rated them according to Google is used. The user ratings are adjusted according to the number of users, as explained in the Weight Modeling Approach section. Interpola- tion is executed as stated in the above-mentioned procedure, within Skopje’s borders. For higher visibility, 7 classes and the colors from red to green are selected, starting from the lowest rating marked with red, to the highest rating marked with green.
3 GIS Modelling Results
The result of this project is a map with three layers. The first layer is a simple repre- sentation of the locations of all travel agencies included in the database, with the ap- propriate details. In the second layer, we can see the interpolation based on the diver- sity in destinations a travel agency provides. This is combined with the symbols that differ in size for increased visibility and easier recognizing of the agencies with the most choice for the user, shown with Fig. 3.
63 63 ICT Innovations 2019 Web Proceedings ISSN 1857-7288
Fig. 3. Interpolation based on destinations offered (Top5 travel agencies are labeled)
Since almost all these agencies are found in the city Centre, the interpolation can be seen best when zoomed in, as shown in Fig. 4.
Fig. 4. Interpolation results based on the total number of long-distance destinations offered in ArcGIS
64 64 ICT Innovations 2019 Web Proceedings ISSN 1857-7288
The third and final layer (Fig. 5) shows the second interpolation based on user re- views. Here, the best travel agencies that offer the best services are shown in green, whereas the worst travel agencies are shown in red. As stated in the previous para- graph, the results are seen well when the map is zoomed in the city center.
Fig. 5. Interpolation results in the city center based on user reviews using ArcGIS (R is the user satisfactory rating, while WR is weighted rating based on (1))
According to the interpolation results, Savana Travel Agency rated as number 1 travel agency with a weighted rating of 0.958, followed by the Atlas plus (WR = 0.543). While on the other hand, neither of these travel agencies offers more than 10 long-distance destinations, they offer the best service according to the users.
4 Conclusion
With this research paper an independent evaluation of website data from travel agen- cies in Skopje, North Macedonia was presented, with the focus on rating the agencies that offer arrangements to long-distance destinations. The accuracy of the available information has been verified, followed by a ranking system based on user ratings as well as the number of offers provided. A weighing model used here, for user satisfac- tory rating, allowed comparability of the data and more inside information reading the relationship in a geographical sense. By using the ArcGIS software, we can indicate which agencies in a geographical sense in the city of Skopje offer the best service
65 65 ICT Innovations 2019 Web Proceedings ISSN 1857-7288
based on user reviews, and at the same time offering the highest number of long- distance vacations. Moreover, the above-mentioned travel agencies to the offered services were located on geo-mapping software. Explanation of the procedure in the software such as further filtering as per user needs allows for more personalized use, such as sub-selection according to the variety of choice of offered destination and user rating. Furthermore, all the data needed for contacting the agencies via the internet or in person is readily available. Finally, the research can be implemented in mobile applications that use maps or in algorithms for independent ranking. The customer will benefit from the readily available, independently rated data for future decision making when planning a trip.
Acknowledgment This work was partially financed by the Faculty of Computer Science and Engi- neering at the Ss. Cyril and Methodius University in Skopje.
References
1. Travels of the domestic population – statistical review: Transport, tourism and other ser- vices, 2016, State statistical office of Republic of Macedonia, http://www.stat.gov.mk/Publikacii/8.4.17.03.pdf, last access 2018/12/20. 2. Chiou, W. C., Lin, C. C., & Perng, C.: A strategic framework for website evaluation based on a review of the literature from 1995–2006. Information & Management, 47(5-6), 282- 290, (2010). 3. Stojchevska, M., Naumoski, A. and Mitreski, K.: Modelling the Impact of the Hotel Facili- ties on Online Hotel Review Score for the City of Skopje. In: 2018 2nd International Sym- posium on Multidisciplinary Studies and Innovative Technologies (ISMSIT), 1-5, IEEE, (2018). 4. Bazazo, I. K., & Khanfar, S. M. Using (GIS9. 3 Software) to Visualize Travel Patterns and Market Potentials of Aqaba City in Jordan, (2014). 5. Gao, M., Fu, Y., Chen, Y. and Jiang, F.: User-Weight Model for Item-based Recommenda- tion Systems. JSW, 7(9), 2133-2140, (2012). 6. Symeonidis, P., Nanopoulos, A. and Manolopoulos, Y.: Feature-weighted user model for recommender systems. In International Conference on User Modeling, 97-106, Springer, Berlin, Heidelberg, (2007). 7. Zlatina Kniga – Travel agencies in Skopje - https://zk.mk/search/?what=туристичка+агенција&filter_city=15457, last accessed 2018/11/11. 8. Map Coordinates - http://www.mapcoordinates.net, last accessed 2018/10/12. 9. Shapefile of Macedonia taken from http://download.geofabrik.de/europe/macedonia.html, last accessed 2018/10/12.
66 66 ICT Innovations 2019 Web Proceedings ISSN 1857-7288
Using continuous integration principles for managing virtual laboratories
Dejan Stamenov, Dimitar Venov, Goran Petkovski, Boro Jakimovski, and Goran Velinov
Faculty of Computer Science and Engineering, Skopje, Republic of North Macedonia, Ss. Cyril and Methodius University [email protected], [email protected], {goran.petkovski,boro.jakimovski,goran.velinov}@finki.ukim.mk
Abstract. Researchers around the world are involved in the process of managing and processing large volumes of data while using a plethora of different tools. This increases the need for virtual laboratoriesthat will allow scientists to create an environment for rapid data processing and analyses, using the latest available technology withoutspendingtoo much time on configuration. In this paper, we present the Virtual Labo- ratory System (vLab) for managing laboratories on the cloud. The sys- tem is based on the service-oriented architecture, built on top of Spring Framework and hosted on the EGI Federated Cloud (FedCloud). It pro- vides an interface for managing clusters of virtual machines that are created by the service, providing tools for data processing and analysis, along with the required datasets for these operations. The architecture of the system and the tools used to achieve successful management of virtual laboratories are also presented with the details of the cloud ar- chitecture where the vLab system is hosted.
Keywords: FedCloud · Virtual laboratory · Virtual organization · Web service.
1 Introduction
The data scientists around the world come from established fields such as statis- tics, engineering, physics, machine learning, business intelligence, neuroscience and more, having special competences (skills and knowledge) that enables them to extract value from digital data. The scientific communities are involved in creating and maintaining data products, in which the end users contribute to- wards improving the product. Users are increasingly interacting with real-time data and services, thus generating lots of data. The science implies a holistic approach and uses scientific methods to extract value from data. This raises the need for virtual laboratories that will allow scientists to create an environment for rapid data processing and analyzes, using the latest available technology. In this paper we present our Virtual Laboratory System (vLab), a service- oriented architecture hosted on the EGI Federated Cloud (FedCloud) [3] that
67 67 ICT Innovations 2019 Web Proceedings ISSN 1857-7288
provides a service for creating and managing virtual laboratories. OpenNeb- ula is used as a cloud computing platform on the data center in our solution, implementing the Open Cloud Computing Interface (OCCI) for managing and monitoring cloud resources. Our system features an Application Programming Interface (API), which provides an automation in the process of managing com- putational resources to the end users. Up until this point, it was required for a cloud administrator to manually set up the cluster of virtual machines, and then apply proper configuration before it can be used by researches. Withthis service, we provide the possibility for each data scientist to create clusters of virtual machines in the cloud without need for a dedicated cloud administrator. The clusters of virtual machines that are created by the service provide tools for data processing and analyses, along with the required datasets. The virtual machines in the cluster are automatically provisioned and configured using Pup- pet [7]. The Hadoop [1] open-source software framework is used in the virtual machines for distributed storage and processing of big data. The paper structure is as follows. Related work is presented in Section 2. The FedCloud structure is provided in Section 3, while the implementation of the vLab service is explained in detail in Section 4. The Section 5 describes the process of automating the deployment of Hadoop cluster using Puppet. Our use case application is provided in Section 6, while concluding the paper contribution and presenting the future work in Section 7.
2Relatedwork
Atlas Tier 3 Cluster Tool (at3c) consists of remote repository for checking cluster parameters and configuration of hardware environment based on real-world use scenarios. Main advantage is quick recovery from undesirable situations (failed hardware, shortage of sustainable environment, etc.) [4], as opposed to our sys- tem which provides automatic configuration only of Hadoop environments. Another collaborative tool for building common e-Infrastructures is CHAIN- REDS project. Their existing grid infrastructure contains materials from re- searches around the world, with mission to provide computational resources to scientists and organizations. Its integration with EGI (European Grid Infras- tructure) and FedCloud has managed to produce functional solution for storing cloud images and templates. Our service-oriented architecture shares similari- ties with this solution, having CHAIN-REDS with one advantage: creation and management of mini-clouds in different VRC (Virtual Research Communities) from all continents in the world [11], with difference to our system which give access only to EGI’s infrastructure.
3 FedCloud structure
EGI Federated Cloud represents heterogeneous research community with aim to provide distributed collection of computational resources. Integrates public and community clouds into a scalable computing platform for data and/or compute
68 68 ICT Innovations 2019 Web Proceedings ISSN 1857-7288
driven applications and services. Each cloud can provide Infrastructure as a Ser- vice (IaaS) features for users, i.e. CPU/GPU computing, storage and network configuration. There are Virtual Organizations (VOs) formed within the EGI Federated Cloud. Each VO of the cloud federation is a resource allocation for a specific community or purpose. The cloud providers who support a given com- munity join the VO dedicated to that community. Users and their applications can consume cloud resources from the VO after becoming members of the same VO. X.509 certificates are used to control the access to the cloud resources.
4 vLab service
The vLab service is the main part of the cloud architecture which provides computational services to the end users. It provides services for creation of com- putational resources1, along with services to remove these resources and listings of available computational resources for each end user. Each computational re- source is created based on a configuration, which includes different specifications for the virtual machines that are being created (in terms of computational power, storage and memory), along with the number of virtual machines being created (these virtual machines form independent cluster) and required configuration files by which the virtual machines in the cluster will be configured to be aware of the existence of other machines in the cluster itself.
4.1 Technical overview
The API (Application Program Interface) is built with Java technology and Spring Framework [12], using the REST architecture (used over HTTP protocol). vLab system API does not directly communicate with the cloud; an intermediate framework jOCCI [5] is used, which provides the functionality of accessing differ- ent instances of the cloud when doing computational services on user’s request. In addition to this, PostgreSQL [6] is used as a main database of the API, storing virtual machine configurations, user’s tokens and the computational resources that are created for each token. The token represents unique set of characters for each user of the cloud, which is sent to the API when requesting a service by the end users. Because every virtual machine created by the API needs additional configuration to become part of a cluster, the Git repository management system - GitLab is used to publish configuration data. This repository is managed by the API itself; when new computational resources are created, configuration files are published in the repository. When resources are removed from the cloud, the configuration data for these resources is removed from the repository. Puppet is used as an orchestrational tool for automatic deployment and configuration of parameters in Hadoop cluster. Description is stored in Git repository,sowhen users apply the puppet template it’s automatically pulled from the repository
1 ”Computational resources” or ”virtual machines”, both terms have the same mean- ing further in the paper.
69 69 ICT Innovations 2019 Web Proceedings ISSN 1857-7288
Repository manager
Apply configuration for each virtual machine in the cluster Gitlab
Publish virtual machine configuration Puppet
Request virtual laboratory vLab API Create cluster of virtual machines on the cloud Researchers Laboratory cluster of virtual machines
FedCloud
Fig. 1. vLab system - architectural overview. and provision into FedCloud. In this way, users from different background can create custom-related scenarios for using cluster management tool (in our paper is described Hadoop cluster) with management of related git repository. The FedCloud VMs are developed on Open Nebula Cloud infrastructure. On Fig. 1 is given the architectural overview.
4.2 API interface
The vLab system API provides multiple interface calls to meet the requirements for management of computational resources on the cloud. The most important API call is the one that creates clusters of virtual ma- chines on the cloud - POST: resource/createresource. This API call has additional required parameters:
– userToken - unique set of characters for each user of the cloud, used to identify the user, – configuration - computational specification for each virtual machine that is being created as part of a cluster (in terms of computational power, storage and memory), and – key - SSH key which will be used by the user to connect to the virtual machine.
After the virtual machines are created, the API call - GET: resource/listresources can be used to retrieve the IP addresses of the computational resources that were created. This API call has the following required parameters:
– userToken - used to identify which virtual machines are created for the given user token, and then returns the IP addresses of these computational resources, and
70 70 ICT Innovations 2019 Web Proceedings ISSN 1857-7288
– configuration - one user may have different number of virtual machines created on the cloud. Each virtual machine is created based on a given con- figuration; this parameter is used to filter the machines which IP addresses will be returned by the API call.
When the computational resources created by the user are not needed, the API call - DELETE: resource/deleteresource will delete all the resources, based on the required parameter: userToken.
4.3 PostgreSQL database design
PostgreSQL is used as the main database of the API, storing configuration data about the clusters of virtual machines and also, keeping track of the user’s cloud resources based on their tokens. On Fig. 2 is presented an overview of the database design. The VM CONFIGURATION table is used to store data about different clus- ter setups. For example, here we keep data on how we need to setup all the virtual machines in a given cluster when the configuration of these virtual machines is based on Hadoop. This allows us to define different setups in the future, so that the API will be able to provision virtual machines in the cluster based on the provided data in the table. The USER TOKENS table is used to store each token that is sent by the user, while also tracking down the number of the resources that are being used by the user. The RESOURCE DETAILS table references the USER TOKENS table, while also storing data about the virtual machines that are created for the user. This is done to lower the load of the cloud; if we don’t store this data in the database and then retrieve it when needed, we would have to query the cloud for each user to get the details about each virtual machine based on user’s request. The integrity of this data is maintained based on each action of the user, meaning that when the user removes the cluster or creates a new one, all the data stored in the database is also removed or new data is added.
5 Hadoop deployment with Puppet
In this section we describe the process of automating the deployment of Hadoop cluster using the Puppet configuration management tool. First, in Subsection 5.1 we give short introduction of Hadoop, then in Subsection 5.2 we give short introduction of Puppet and finally, in Subsection 5.3 we provide the details of the deployment process.
5.1 Hadoop
Hadoop [1] is an open source software framework for large scale distributed data storage and data analyses [13]. The framework is designed to be able to scale from single machine to thousands of machines, where each machine offers local
71 71 ICT Innovations 2019 Web Proceedings ISSN 1857-7288
VM_CONFIGURATION USER_TOKENS
PK configuration_id PK user_token
virtual_organization number_of_resources
virtual_machine_template
virtual_machine_template_size RESOURCE_DETAILS
virtual_machine_memory PK resource_id
virtual_machine_cpu FK user_token_id
number_virtual_machines resource_url
resource_ip
vm_type
virtual_organization
Fig. 2. vLab API database design. storage and computation. Hadoop follows the master-slave architecture, where a cluster is composed of one NameNode (the master) and one or more DataNodes (the slaves). The NameNode manages the file system namespace and regulates access to files by clients, and the DataNodes manage storage and computation. The main goals of Hadoop are quick detection of faults and automatic recov- ery from them, high throughput of data access rather than low latency of data access, support for large files in the order of terabytes and support for tens of millions of files in a single instance [2]. The framework includes the following modules: – Hadoop Common - Common utilities that support the other modules, – Hadoop Distributed File System (HDFS) -Adistributedfilesystem that provides high-throughput access, – Hadoop YARN - A framework for job scheduling and cluster resource management, and – Hadoop MapReduce - Implementation of the MapReduce programming model for processing large data sets.
5.2 Puppet Puppet [7] is a configuration management tool which is written in the Ruby [10] programming language and allows us to control and automate the configura- tion of all elements comprising an IT system, such as hosts, installed software,
72 72 ICT Innovations 2019 Web Proceedings ISSN 1857-7288
Fig. 3. vLab system - web interface overview. users, running services, configuration files, scheduled tasks, storage, monitoring and security. Through Puppet, we can describe the system desired end state either using Ruby or a Ruby domain specific language (DSL). Puppet based infrastructure generally consists of a master server with agents installed on each client/node (master-slave architecture).
5.3 Automatic Deployment of Hadoop With Puppet
Deploying Hadoop consists of unpacking the desired version of Hadoop on all of the machines in the cluster in the desired place. Then, setting up the con- figuration files to include the addresses of the NameNode (the master), andthe addresses of the DataNodes (the slaves), as well as some additional configuration parameters. Then, we need to enable communication without password between the NameNode and all of the DataNodes, which include a creation of a special user for Hadoop. Finally, we need to format HDFS and start Hadoop from the NameNode. To be able to automate this, we have developed a Puppet module which is composed of multiple Puppet classes that are supposed to deploy Hadoop on all of the machines in the cluster. One of the challenges that we faced was synchronization between all of the Puppet agents. In order to start Hadoop,we have to first deploy Hadoop on all of the nodes, and then start it. But, because Puppet catalogs are executed on Puppet’s agents independently, the NameNode Puppet agent doesn’t know whether Hadoop has been deployed on all of the other servers. To achieve this synchronization, we have used PuppetDB [9] with Facter [8], and through PuppetDB they are accessible to all of the Puppet agents. The NameNode Puppet agent is reading these facts and as soon as all of the DataNodes are set up, it continues with its execution, which includes formating HDFS and starting the cluster.
6 Use case application
As part of developing and contribution to the EGI project, we have developed web application which allows the data scientists to create virtual laboratory
73 73 ICT Innovations 2019 Web Proceedings ISSN 1857-7288
Fig. 4. vLab system - available clusters.
Fig. 5. vLab system - IP addresses from the machines in the cluster.
environments using Hadoop, while effectively managing its instances. Users are authenticated with x509 certificates that are used in EGI services.
On Fig. 3 is displayed the default interface of the application. On Fig. 4 is given the option for selection of virtual laboratories provided by the web application. When the type of virtual laboratory is chosen, the application will ask for the public SSH key which will be used for signing in the created virtual machines. The users are able to choose the desired scale of the cluster in the cloud. Finally, after successful provision of machines, the web application is providing the IP addresses of every instance part of the created cluster, as shown on Fig. 5. These IP addresses are public, which means the users can sign in from every device which has Internet connectivity. After finishing with their work, the cluser can be shut down with graceful poweroff signal sent to the interface of OpenNebula. Then, the virtual machines are terminated and get deleted from the list of available machines.
74 74 ICT Innovations 2019 Web Proceedings ISSN 1857-7288
7 Conclusion and future work
In this paper we presented our service-oriented architecture for managing virtual laboratories - Virtual Laboratory System (vLab), hosted on the EGI Federated Cloud (FedCloud). The virtual machines created by the API provide tools for data processing and analysis. The details of the API were provided, along with the Puppet configuration, which provides automation in the process of deploy- ment of the Hadoop machines. The EGI Federated Cloud structure was also provided. In future we are planning to conduct researches where we would test if the Virtual Laboratory System (vLab) is capable of being deployed on other cloud providers like Google Cloud Platform, Elastic Compute Cloud and Microsoft Azure, and if it is not, what changes would be required to make the system cross-platform. Also, we are planning to extend our service-oriented architecture to include other laboratories, for example: Apache Spark, and to provide more configuration options for all of the existing laboratories.
References
1. Apache: Hadoop documentation (2019), http://hadoop.apache.org/docs/current/ 2. Borthakur, D.: The hadoop distributed file system: Architecture anddesign(2008) 3.Foundation, T.E.: The egi federated cloud (2019), https://www.egi.eu/federation/egi-federated-cloud/ 4. Hendrix, V., Benjamin, D., Yao, Y.: Scientific cluster deployment and recovery– using puppet to simplify cluster management. In: Journal of Physics: Conference Series. vol. 396, p. 042027. IOP Publishing (2012) 5. Kimle, M.: jocci api client library (2019), https://github.com/Misenko/jOCCI-api 6. PostgreSQL: Postgresql documentation (2019), https://www.postgresql.org/docs/ 7. Puppet: Puppet documentation (2019), https://puppet.com/docs/ 8. Puppet: Puppet facter documentation (2019), https://docs.puppet.com/facter/ 9. Puppet: Puppetdb documentation (2019), https://docs.puppet.com/puppetdb/ 10. Ruby: Ruby documentation (2019), https://www.ruby- lang.org/en/documentation/ 11. Ruggieri, F., Andronico, G., Barbera, R., Matyska, L.: Introduction to the chain- reds project (objectives and achievements). In: e-Infrastructures for e-Sciences 2013 A CHAIN-REDS Workshop organised under the aegis of the European Commis- sion. vol. 199, p. 001. SISSA Medialab (2014) 12. Software, P.: Spring framework (2019), https://spring.io/ 13. Taylor, R.C.: An overview of the hadoop/mapreduce/hbase framework and its current applications in bioinformatics. In: BMC bioinformatics. vol. 11, p. S1. BioMed Central (2010)
75 75 ICT Innovations 2019 Web Proceedings ISSN 1857-7288
Visualization of the nets of type of Zaremba−Halton constructed in generalized number system⋆
V. Dimitrievska Ristovska1, V. Grozdanov2, and Ts. Petrova2
1 Faculty of Computer Science and Engineering, University ”SS. Cyril and Methodius”, 16 Rugjer Boshkovikj str. 1000, Skopje, Macedonia [email protected] 2 Department of Mathematics, Faculty of Natural Sciences and Mathematics, South West University ”Neofit Rilski”, 66 Ivan Michailov str., 2700 Blagoevgrad, Bulgaria [email protected],[email protected]
Zκ,µ Abstract. In the present paper a class of two-dimensional nets B2,ν of type of Zaremba-Halton constructed in generalized B2−adic system is introduced. In order to show their very well uniform distribution, we made their visualization with mathematical software Mathematica.
In our paper ”On the (VilB2 ; α; γ)−diaphony of the nets of type of Zarem- ba - Halton constructed in generalized number system” (in preparation) Zκ,µ we constructed a class B2,ν of two-dimensional nets (throughout this paper the term net will denote finite sequence) of type of Zaremba-
Halton. Also, the (VilB2 ; α; γ)−diaphony which is based on using two- dimensional Vilenkin functions constructed in the same B2−adic system, Zκ,µ of the nets of the class B2,ν is investigated. The obtained results have theoretical character and treat to the influence of the parameter α to
the exact order of the (VilB2 ; α; γ)−diaphony of the nets from the class Zκ,µ B2,ν . The purpose of this paper is to present in extended form the visualization Zκ,µ of some concrete nets from the class B2,ν and to show the distribution of the points of these nets. In this sense the reader can understand the distribution properties of the considered nets. To construct netsofthe Zκ,µ class B2,ν we use the mathematical software Mathematica. Our interest for the importance of these types of sequences is implied from their usage in numerical integration, construction of random num- ber generators e.t.c.
Keywords: Nets of type of Zaremba-Halton · Visualization.
1 Introduction
Let s 1 be a fixed integer which will denote the dimension throughout the paper.≥ The functions of some classes of complete orthonormal systems overthe
⋆ Supported by Faculty of Computer Science and Engineering at ”Ss Cyril and Methodius” Univ. in Skopje
76 76 ICT Innovations 2019 Web Proceedings ISSN 1857-7288
s dimensional unit cube [0, 1)s are used with a big success as an analytical tool for− investigation of the distribution of sequences and nets. Also, thesesystems stand on the basis of the definitions of some quantitative measures which show the quality of the distribution of sequences and nets. We will remind the def- initions of the functions of some complete orthonormal function systems.The first example is the trigonometric system. For an arbitrary integer k the func- 2πikx tion ek :[0, 1) C is defined as ek(x)=e ,x [0, 1). For an arbi- → s ∈ s trary vector k =(k1,...,ks) Z the function ek :[0, 1) C is defined s ∈ → s as ek(x)= ek (xj), x =(x ,...,xs) [0, 1) . The set s = ek(x):k j 1 ∈ T { ∈ j=1 ! Zs, x [0, 1)s is called trigonometric function system. ∈ } Following Larcher, Niederreiter and Schmid [9] we will present the concept of the Walsh functions over finite groups. For this purpose let m 1 be a given ≥ integer and b1,b2,...,bm :1 l mbl 2 be a set of fixed integers. For { Z ≤ ≤ ≥ } 1 l m let bl = 0, 1,...,bl 1 the operation bl is the addition modulus ≤ ≤ Z { − } ⊕ bl and ( bl , bl ) is the discrete cyclic group of order bl. ⊕Z Z Let G = b1 ... b and the operation G =( b1 ,..., b ) is defined as: × × m ⊕ ⊕ ⊕ m for arbitrary elements g, y G, where g =(g ,...,gm) and y =(y ,...,ym), ∈ 1 1 let g G y =(g b1 y ,...,gm b ym). Then, the couple (G, G)isafinite ⊕ 1 ⊕ 1 ⊕ m ⊕ group of order b = b1 ...bm. For arbitrary g, y G of the above form the character function χg(y)is m∈ glyl defined as χg(y)= exp 2πi . Let ϕ : 0, 1,...,b 1 G be an bl { − } → l=1 " # arbitrary bijection which! satisfies the condition ϕ(0) = 0. For an arbitrary integer k 0 and a real x [0, 1) with the b adic rep- ν ≥ ∞ ∈ − i −i−1 resentations k = kib and x = xib , where ki,xi 0, 1,...,b 1 , ∈ { − } i=0 i=0 $ $ kν = 0 and for infinitely many of ixi = b 1, the k th Walsh function over ̸ ̸ − − the group G with respect to the bijection ϕ G,ϕwalk :[0, 1) C is defined as ν →
G,ϕwalk(x)= χϕ(ki)(ϕ(xi)). i=0 !N N Ns Let us denote 0 = 0 . For an arbitrary vector k =(k1,...,ks) 0 the ∪{ } s ∈ s function G,ϕwalk :[0, 1) C is defined as G,ϕwalk(x)= G,ϕwalk (xj), x = → j j=1 s N!s s (x1,...,xs) [0, 1) . The set G,ϕ = G,ϕwalk(x):k 0, x [0, 1) is called a system of Walsh∈ functions overW the group{ G with respect∈ to∈ the bijection} ϕ. In the case when m =1,G= Zb and ϕ =idistheidentitybetweenthe Z sets 0, 1,...,b 1 and b the obtained function system Zb,id is the system (b){ of the Walsh− } functions of order b which was proposedW by Chrestenson [1]. W If b =2then,thesystem Z2,id is the original system (2) of the Walsh [13] functions. W W The numerical measures for the irregularity of the distribution of sequences and nets in [0, 1)s give us the order of the inevitable deviation of a concrete
77 77 ICT Innovations 2019 Web Proceedings ISSN 1857-7288
distribution from the ideal distribution. Generally, they are different kinds of the discrepancy and the diaphony. So, let ξN = x0, x1,...,xN−1 be an arbitrary net of N 1 points in [0, 1)s. { } ≥ s For an arbitrary vector x =(x ,...,xs) [0, 1) let us denote [0, x)= 1 ∈ [0,x ) ... [0,xs) and A(ξN ;[0, x)) = # xn :0 n N 1, xn [0, x) . 1 × × { ≤ ≤ − ∈ } The L discrepancy of the net ξN is defined as 2− 1 2 2 s A(ξN ;[0, x)) T (ξN )= xj dx1 ...dxs . ⎛ s ) N − ) ⎞ ([0,1] ) j=1 ) ) ! ) ⎜ ) ) ⎟ ⎝ ) ) ⎠ In 1976 Zinterhof [14] introduced) the notion of) the so - called classical di- aphony. So, the diaphony of the net ξN is defined as
1 2 N 2 1 −1 F ( ; ξ )= R−2(k) e (x ) , s N N k n T ⎛ s ) ) ⎞ k∈Z \{0} ) n=0 ) $ ) $ ) ⎝ ) ) ⎠ ) ) s s where for each vector k =(k ,...,ks) Z the coefficient R(k)= R(kj) and 1 ∈ j=1 for an arbitrary integer k !
1, if k =0, R(k)= k , if k =0. - | | ̸ In order to study sequences and nets constructed in b adic system, in the last twenty years, some new versions of the diaphony was defined.− These kinds of the diaphony are based on using complete orthonormal function systems constructed also in base b. In 2001 Grozdanov and Stoilova [5] used the system (b) of the Walsh function to introduce the concept the so-called b adic diaphony.W In − 2003 Grozdanov, Nikolova and Stoilova [4] used the system G,ϕ of the Walsh functions over the group G with respect to the bijection ϕWto introduce the concept of the so-called generalized b adic diaphony. So, the generalized b adic − − diaphony of the net ξN is defined as
1 2 N 2 1 1 −1 F ( G,ϕ; ξN )= ρ(k) G,ϕwalk(xn) , W ⎛(b + 1)s 1 N ⎞ Ns ) n ) − k∈ 0\{0} ) =0 ) $ ) $ ) ⎝ ) ) ⎠ ) s ) s where for a vector k =(k ,...,ks) N the coefficient ρ(k)= ρ(kj) and for 1 ∈ 0 j=1 an arbitrary non-negative integer k !
1, if k =0, ρ(k)= b−2g, if bg k bg+1,g 0,g Z. - ≤ ≤ ≥ ∈
78 78 ICT Innovations 2019 Web Proceedings ISSN 1857-7288
In 1954 Roth [11] obtained a general lower bound of the L2 discrepancy of − s arbitrary net. He prove that for any net ξN composed of N points in [0, 1) the lower bound s−1 (log N) 2 T (ξ ) >C(s) (1) N N holds, where C(s) is a positive constant depending only on the dimension s. In 1986 Proinov [10] obtained a general lower bound of the diaphony of s arbitrary net. So, for any net ξN composed of N points in [0, 1) the lower bound s−1 (log N) 2 F ( s; ξN ) > α(s) (2) T N holds, where α(s) is a positive constant depending only on the dimension s. Cristea and Pillichshammer [2] obtained a general lower bound of the b adic − s diaphony of arbitrary net. So, for any net ξN composed of N points in [0, 1) the lower bound s−1 (log N) 2 F ( (b); ξN ) C(b, s) (3) W ≥ N holds, where C(b, s) is a positive constant depending on the base b and the dimension s. About the exactness of the lower bound (1) of the L2 discrepancy of two - dimensional nets the following results are obtained. Let b −2 and ν > 0 be fixed integers. ≥ Following Van der Corput [12] and Halton [6] for an arbitrary integer i ν−1 ν j such that 0 i b 1 with the b adic representation i = ijb we put ≤ ≤ − − j=0 $ ν−1 i p (i)= i b−j−1. Let us denote η (i)= and to construct the net b,ν j b,ν bν j=0 $ ν Rb,ν = (ηb,ν (i),pb,ν (i)) : 0 i b 1 . The net R2,ν was introduced and studied by{ Roth [11]. Hammersley≤ ≤ [8] used− } sequences of Van der Corput Halton constructed in different bases to construct a class of s dimensional nets.− − Roth proved that the L2 discrepancy T (R2,ν ) of the net R2,ν have an exact log N − order , where N =2ν . According to the lower bound (1) the order O N " # log N is not the best possible. O N " # There exist two important techniques to improve the order of the L2 discrepancy of arbitrary net. They are a digital shift and a symmetrization− of the points of the net. The first approach was realized in 1969 by Halton and Zaremba [7]. They used the original net of Van der Corput p ,ν (i)= { 2 0,i i ...iν : ij 0, 1 , 0 j ν 1 and changed the digits ij that 0 1 −1 ∈ { } ≤ ≤ − } stay on the even positions with the digits 1 ij. In this way the net z ,ν (i)= − { 2 0, (1 i0)i1(1 i2) ... : ij 0, 1 , 0 j ν 1 is obtained. The net − − ∈ { }ν ≤ ≤ − } Z ,ν = (η ,ν (i),z ,ν (i)) : 0 i 2 1 which now usually is called net of 2 { 2 2 ≤ ≤ − }
79 79 ICT Innovations 2019 Web Proceedings ISSN 1857-7288
Zaremba Halton is constructed. It is shown that the L discrepancy of the net − 2− √log N ν Z ,ν has an order ,N=2 , which of course is the best possible. 2 O N " # In 1998 Xiao proved the exactness of the lower bound (2) of the diaphony for dimension s =2. It is shown that the diaphony of the nets of Roth Rb,ν and Zaremba Halton Zb,ν , both constructed in arbitrary base b 2, have an exact − ≥ √log N order ,N= bν . O N " # Grozdanov and Stiolova [5] proved the exactness of the lower bound (3) of the b adic diaphony for dimension s =2. They proved that the b adic diaphony − − √log N ν of the nets Rb,ν and Zb,ν have an exact order ,N= b . O N " # p,q Grozdanov [3] constructed the so-called net of Zaremba Halton G,ϕZb,ν over the group G with respect to the bijection ϕ and obtained− the exact order √log N p,q of the generalized b adic diaphony of the nets Rb,ν and G,ϕZ . O N − b,ν " #
2 The nets of type of Zaremba-Halton constructed in generalized system
We will remind the concept of the so-called generalized number system or a system with variable bases. Let the sequence of integers B = b0,b1,b2,...: bi 2 for i 0 be given. The so-called generalized powers are defined{ by the next≥ ≥ } recursive equalities. We put B0 = 1 and for j 0 define Bj+1 = Bj.bj. Then, an arbitrary integer k 0 and a real x [0, 1)≥ in the generalized B-adic number system can be represented≥ of the form∈
ν ∞ xi k = kiBi and x = , Bi i=0 i=0 +1 $ $ where for i 0,ki,xi 0, 1,...,bi 1 ,kν = 0 and for infinitely many i we ≥ ∈ { − } ̸ have xi = bi 1. ̸ − For arbitrary reals x, y [0, 1) which in the B adic system have the rep- ∈ ∞ −∞ x y resentations of the form x = i and y = i , where for i 0 Bi Bi ≥ i=0 +1 i=0 +1 $ $ xi,yi 0, 1,...,bi 1 and for infinitely many i we have xi,yi = bi 1, we define∈ the{ operation− } ̸ −
∞ [0,1) xi + yi(mod bi) x B y = (mod 1). ⊕ Bi .i=0 +1 / $ In the next definition we will present the concept of the sequence of Van der Corput constructed in the generalized B adic system. −
80 80 ICT Innovations 2019 Web Proceedings ISSN 1857-7288
Definition 1 For an arbitrary integer i 0 which in the generalized B adic ≥ − system has the representation i = imBm + im Bm + + i B , where for −1 −1 ··· 0 0 0 j mij 0, 1,...,bj 1 and im =0, we put ≤ ≤ ∈ { − } ̸
i0 i1 im pB(i)= + + ...+ . B1 B2 Bm+1
The sequence ωB =(pB(i))i≥0 is called a sequence of Van der Corput constructed in the generalized system B.
The sequence ωB was investigated by many several authors and used for different purposes. Now we explain the constructive principle of two-dimensional nets (intro- duced in our mentioned paper) of the type of Zaremba-Halton constructed in generalized system. (1) (1) (1) (1) (1) (1) For this purpose let B = b ,b ,...,b ,bν ,b ,...: b 2 for j 1 { 0 1 ν−1 ν+1 j ≥ ≥ 0 be a given sequence of bases. By using the sequence B1 we will define the } (2) (2) (2) (2) (2) (2) sequence B2 = b0 ,b1 ,...,bν−1,bν ,bν+1,... : bj 2 for j 0 in the { (2) (1) (2) (1) (2)≥ (1) ≥ } following manner. We put b0 = bν−1,b1 = bν−2,...,bν−1 = b0 and the bases (2) (2) bν ,b ,... can be chosen arbitrary. Let us denote =(B ,B ). ν+1 B2 1 2 Let us assume that the sequences B1 and B2 of bases are limited from above, i. e. there exists a constant M 2 such that for each j 0 and τ =1, 2wehave (τ) ≥ ≥ bj M. ≤ (1) (1) (1) (1) Let ν 1 be an arbitrary and fixed integer. We have that Bν = b0 .b1 ...bν−1 (2) ≥ (2) (2) (2) (1) (1) (1) (1) (2) and Bν = b0 .b1 ...bν−1 = bν−1.bν−2 ...b0 , so Bν = Bν . Let us denote (1) (2) i Bν = Bν = Bν . For 0 i Bν 1 let us define the quantity ην (i)= . ≤ ≤ − Bν (1) Let κ =0, κν κν ...κ , where for 0 j ν 1 κν j 0, 1,...,b −1 −1 0 ≤ ≤ − −1− ∈ { j − 1 be a fixed B1 adic rational number. } − (2) Let µ =0,µ0µ1 ...µν−1, where for 0 j ν 1 µj 0, 1,...,bj 1 be a fixed B adic rational number. ≤ ≤ − ∈ { − } 2− The different choice of the parameters κ and µ permits us to obtain a class of two-dimensional nets with similar construction and distribution properties of the points in the plane. For an arbitrary i such that 0 i Bν 1 let us denote ≤ ≤ − κ [0,1) µ [0,1) η (i)=ην (i) κ and z (i)=pB2 (i) µ. ν ⊕B1 ν ⊕B2 Definition 2 Let ν 1 be an arbitrary and fixed integer. Let κ and µ be as above. The two-dimensional≥ net
κ,µ κ µ Z = (η (i),z (i)) : 0 i Bν 1 B2,ν { ν ν ≤ ≤ − } we will call a net of type of Zaremba-Halton constructed in the generalized adic system. B2−
81 81 ICT Innovations 2019 Web Proceedings ISSN 1857-7288
In our mentioned paper we investigated and proved ”very well” uniform distribution of these nets. Our interest for the importance of these types of sequences is implied from their usage in numerical integration, construction of random number generators e.t.c.
3 Visualization results for nets of type of Zaremba-Halton
We will use the above mathematical model to present a version of the program in Mathematica which can construct arbitrary net of type of Zaremba - Halton, for the chosen parameters. In our work we will confine only to present a visualization of some concrete κ,µ nets from the class ZB2,ν and to show the distribution of the points of these nets. κ,µ Some theoretical investigations of the nets from the class ZB2,ν have been realized in other paper of the authors. Our code of the program, for some choices of parameters, is presented below: ( parameters ) nu∗ = 3; mi = ∗2, 1, 3 ;ni= 1, 0, 1 ; b2 [ 0 ] = 5; b2{ [ 1 ] = 2;} b2 [ 2 ] ={ 3; b2v =} 5, 2, 3 ; b1 [ 0 ] = 3; b1 [ 1 ] = 2; b1 [ 2 ] = 5; b1v = {3, 2, 5} ; B1 [ 0 ] = B2 [ 0 ] = 1 ; niza = ; { } ( end parameters ) {} ∗ ∗ Do [ B1 [ j ] = B1 [ j 1] b1 [ j 1]; B2 [ j ] = B2 [ j − 1]∗ b2 [ j − 1] , j, nu ] − ∗ − { } For [ i = 0 , i < b2 [ 2] , i++, For [ j = 0 , j < b2 [ 1] , j++, For [ k = 0 , k < b2 [ 0] , k++, pc = i, j, k ; { } p=Reverse[pc]; z=Mod[mi+p,b2v]; zv = Sum[ z [ [ l ]]/B2[ l ] , l, nu ]; et = Mod[ ni + pc , b1v ];{ } etv = Sum[et[[l]]/B1[l], l, nu ]; AppendTo [ niza , etv , zv ] { } { } ]]] We illustrate the work of the code by some examples. In the first example we make the following choice of the parameters: ν =3, (1) (1) (1) (2) (2) (2) b0 =3,b1 =2,b2 =5,b0 =5,b1 =2,b2 =3, κ =0.101 and µ =0.213. The distribution of the points of this net is given in Fig. 1.
82 82 ICT Innovations 2019 Web Proceedings ISSN 1857-7288
0.8
0.6
0.4
0.2
0.2 0.4 0.6 0.8
(1) (1) (1) Fig. 1. b0 =3,b1 =2,b2 =5, κ =0.101 and µ =0.213.
κ,µ The points of the net ZB2,ν are:
11 15 12 21 13 27 14 3 10 9 16 12 Zκ,µ = , , , , , , , , , , , , B2,ν !" 30 30 # " 30 30 # " 30 30 # " 30 30 # " 30 30 # " 30 30 #
17 18 18 24 19 0 15 6 21 16 22 22 , , , , , , , , , , , , " 30 30 # " 30 30 # " 30 30 # " 30 30 # " 30 30 # " 30 30 # 23 28 24 4 20 10 26 13 27 19 28 25 29 1 , , , , , , , , , , , , , , " 30 30 # " 30 30 # " 30 30 # " 30 30 # " 30 30 # " 30 30 # " 30 30 # 25 7 1 17 2 23 3 29 4 5 0 11 , , , , , , , , , , , , " 30 30 # " 30 30 # " 30 30 # " 30 30 # " 30 30 # " 30 30 # 6 14 7 20 8 26 9 2 5 8 , , , , , , , , , . " 30 30 # " 30 30 # " 30 30 # " 30 30 # " 30 30 #$
1.0
0.8
0.6
0.4
0.2
0.2 0.4 0.6 0.8 1.0
(1) (1) (1) (1) Fig. 2. b0 =2,b1 =9,b2 =7,b3 =16, κ =0.180F and µ =0.F 101.
83 83 ICT Innovations 2019 Web Proceedings ISSN 1857-7288
(1) (1) (1) (1) In the second example we take ν =4,b0 =2,b1 =9,b2 =7,b3 =16, (2) (2) (2) (2) b0 =16,b1 =7,b2 =9,b3 =2, κ =0.180F and µ =0.F 101, where F is the symbol for the number 15 in the number system with base 16. The distribution of the points of the second net is given in Fig.2. In the third example (see Fig.3) we make the following choice of the parameters: (1) (1) (1) (1) ν =4,b0 =2,b1 =3,b2 =7,b3 =11.
1.0
0.8
0.6
0.4
0.2
0.2 0.4 0.6 0.8 1.0
(1) (1) (1) (1) Fig. 3. b0 =2,b1 =3,b2 =7,b3 =11.
In the next example (see Fig. 4) we make the following choice of the parameters: (1) (1) (1) (1) (1) (1) ν =6,b0 =2,b1 =3,b2 =2,b3 =5,b4 =5,b5 =3.
1.0
0.8
0.6
0.4
0.2
0.2 0.4 0.6 0.8 1.0
(1) (1) (1) (1) (1) (1) Fig. 4. b0 =2,b1 =3,b2 =2,b3 =5,b4 =5,b5 =3.
84 84 ICT Innovations 2019 Web Proceedings ISSN 1857-7288
(1) (1) Some more examples with ν =6, are presented on the Fig. 5 ( b0 =2,b1 =3, (1) (1) (1) (1) (1) (1) (1) (1) b2 =4,b3 =5,b4 =6,b5 =8)andFig.6(b0 =13,b1 =11,b2 =7,b3 =5, (1) (1) b4 =3,b5 =2).
1.0
0.8
0.6
0.4
0.2
0.2 0.4 0.6 0.8 1.0
(1) (1) (1) (1) (1) (1) Fig. 5. b0 =2,b1 =3,b2 =4,b3 =5,b4 =6,b5 =8.
1.0
0.8
0.6
0.4
0.2
0.2 0.4 0.6 0.8 1.0
(1) (1) (1) (1) (1) (1) Fig. 6. b0 =13,b1 =11,b2 =7,b3 =5,b4 =3,b5 =2.
85 85 ICT Innovations 2019 Web Proceedings ISSN 1857-7288
4 Conclusion
We made visualizations with mathematical software Mathematica of the two-dimensional Zκ,µ B − nets B2,ν of type of Zaremba-Halton constructed in generalized 2 adic system. We show some examples with different number of points, different bases and dif- ferent shift - vectors. All of the realized experiments with graphical representations, where the number of points in the nets was between 24 and 2 000 000, very well confirm the theoretical results, about uniform distribution of the points in the introduced net, obtained in our previous mentioned paper.
References
1. Chrestenson, H. E.: A class of generalized Walsh functions. Pacific J. Math. 5,17–31 (1955). 2. Cristea, L. L., Pillichshammer, F.: A lower bound on the b−adic diaphony. Ren. Math. Appl. 27 (2), 147 –153 (2007). 3. Grozdanov, V.: The generalized b−adic diaphony of the net of Zaremba - Halton over finite abelian groups. Comp. Rendus Akad. Bulgare Sci. 57 (6), 43–48 (2004). 4. Grozdanov, V., Nikolova, E. V., Stoilova, S. S.: Generalized b−adic diaphony. Comp. Rendus Akad. Bulgare Sci. 56 (4), 137- -142 (2003). 5. Grozdanov, V., Stoilova, S. S.: On the theory of b−adic diaphony. Comp. Rendus Akad. Bulgare Sci. 54 (3), 31–34 (2001). 6. Halton, J. H.: On the efficiency of certain quasi-random sequences of points in evaluating multi-dimensional integrals. Numer Math. 2,84–90(1960). 7. Halton, J. H., Zaremba, S. K.: The extreme and L2 discrepancy of some plane sets. Monatsh. Math. 73,316–328(1969). 8. Hammersley, J. M.: Monte Carlo methods for solving multivariable problems. Ann. New York Acad. Sci. 86,(3),844–874(1960). 9. Larcher, G., Niederreiter, H., Schmid W. Ch.: Digital nets and sequences constructed over finite rings and their applications to quasi-Monte Carlo integration. Monatsh. Math. 121 (3), 231–253 (1996). 10. Proinov, P. D.: On irregularities of distribution. Comp. Rendus Akad. Bulgare Sci. 39 (9), 31- -34 (1986). 11. Roth, K.: On the irregularities of distribution. Mathematika 1,(2),73–79(1954). 12. Van der Corput J. G., Verteilungsfunktionen I. Proc. Akad. Wetensch. Amsterdam 38 (8), 813–821 (1935). 13. Walsh, J. L.: A closed set of normal orthogonal functions. Amer. J. Math. 45,5–24 (1923). 14. Zinterhof, P.: Uber¨ einige Absch¨atzungen bei der Approximation von Funktionen mit Gleichverteilungsmethoden. S. B. Akad. Wiss., math.-naturw. Klasse, Abt. II 185, 121–132 (1976).
86 86 ICT Innovations 2019 Web Proceedings ISSN 1857-7288
Minimizing Total Weight of Feedback Arcs in Flow Graphs
Marija Mihova, Ilinka Ivanoska, Kristijan Trifunovski, Kire Trivodaliev
Ss. Cyril and Methodius University, Faculty of Computer Science and Engineering, 1000 Skopje, Macedonia {marija.mihova,ilinka.ivanoska,kire.trivodaliev}@finki.ukim.mk, [email protected]
Abstract. The problem of determining the minimal weight feedback edge set in an oriented weighted graph when the graph represents a network flow is the fo- cus of this paper. The proposed solution is especially interesting since it can be directly applied in solving the linearization of genome sequence graphs (GSG) which represent a set of individual haploid reference genomes as paths in a sin- gle graph and can find frequent human genetic variations, including transloca- tions, inversions, deletions, and insertions. The linearization is essential for the efficiency of storing and traversal as well as visualization of the graph. In this paper properties of the flow function are used to obtain some features of the op- timal ordering of nodes when the weight of the feedback edge set is minimal.
Keywords: Two-terminal flow network × Feedback arcs × Linearization × Flow graph.
1 Introduction
In graph theory, a feedback arc set, or feedback edge set in an unweighted directed graph is a set of edges which, when removed from the graph, leave a directed acyclic graph, or DAG. In fact, it is a set containing at least one edge of every cycle in the graph. Finding a minimal set with this property is a key step in layered graph drawing [1]. In weighted graph this operation can be defined as a minimization of the total weight of feedback arcs, which is the minimum weight of the edges such that when removed from the graph, a DAG is obtained. This problem has various applications, most prominent ones being tournaments graphs [5] and genome sequence graph [3]. The decision version of the problem is NP-complete and one of the Richard M. Karp's NP-complete problems [6]. There are some algorithms whose running time is a fixed polynomial in the size of the input graph, but exponential in the number of edges in the feedback arc set [2] or dimension of a cycle space [4], although some special cas- es of tournament graphs can be solved in polynomial time. The focus of application in this paper is finding the minimal feedback edge set of a Genome Sequence Graph (GSG) [9], a graph used to represent the human genetic variation into the reference human genome. Each node of a GSG represents a single
87 87 ICT Innovations 2019 Web Proceedings ISSN 1857-7288
DNA base that occurs at an orthologous locus in one or more of the haploid genomes represented. Each arc corresponds to an adjacency that occurs between consecutive instances of bases in the represented genomes, and it is directed according to the de- fault strand direction of the DNA sequence. The weight to each arc is added in order to emphasize the importance of an arc in typical applications running on the graph, for example number of times that the arc is traversed in the reference genomes used to build the graph. The representation of GSG has importance for the efficiency and the effectiveness of the operations such as access, traversal and visualization, all of which depend on the GSG structure/representation, and the best GSGs are ones that have their nodes ordered in a straight line, a structure obtained in a process known as line- arization of the graph. Usually the linearization of a sequence graph aims to lessen the total weight of all feedback arcs, i.e. the weighted feedback, as much as possible. On the other hand, a GSG graph can be easily modified into a graph that represent a flow, which means that the inflow into each node representing a genome is equal to the out flow out of any node representing a genome. This characteristic gives specific fea- tures to the minimal weighted feedback edge ordering, so pointing and distinguishing some of them is the main interest in this paper. The rest of this paper is organized as follows. The second section presents the terminology and definitions from network flow theory used in the research. The next section defines the linearization of Genome Sequence Graph as a network flow prob- lem. Section 4 presents our findings in terms of specific features the optimal ordering of nodes has when the weight of the feedback edge set is minimal. The paper is con- cluded in the fifth section.
2 Flow Function and Minimal Path Vectors
A flow in an undirected network G(V, E, c) is a function f : V × V®R+È{0} that satis- fies the following three constraints:
1. Capacity constraint: 0 ≤ f (u, v) ≤ c({u, v}), for all u, vÎV, i.e., the flow of an edge cannot exceed its capacity.
2. Flow conservation: for each vÎV,
0, % ∉ {4, 5}
!(#, %) − !(%, #) = ∑+∈- !(*, %) − ∑/∈- !(%, .) = 0 |!|, % = 4 , (1) −|!|, % = 5
where |f | is the value of the flow. In other words, the total flow entering a node v, !(#, %), must be equal to the total flow leaving node v, !(%, #), ∀v ÎV\{s, t}; the flow leaving s and the flow entering t is equal to the value of the flow.
3. For all vertices u, vÎV, if f(u, v) > 0, then f(v, u) =0. In other words, each flow uses a given edge only in one direction. It is assumed that if there is no edge {u, v}, i.e. {u, v}ÏE, then f(u, v) = 0.
88 88 ICT Innovations 2019 Web Proceedings ISSN 1857-7288
In the rest of the paper we will use the term flow graph for a weighted directed graph with flow as a weight function. Here we give some additional definitions that help define the problem and explain the algorithm.
Definition 1. Let G(V, E, c) be a two-terminal flow network. For a flow f, we define flow vector :<<<;⃗ induced by f by
xi = f (u, v) = f (ei) , (2) where ei ={u, v}.
Definition 2. Given a two-terminal undirected flow network G(V, E, c), the vector :⃗ is a path vector to level d, d-P, if and only if there is a flow f with a value |f | = d, such that for all edges ei = {u, v}, f(u, v) £ xi. In other words, :⃗ is a path vector to level d if and only if a flow d may be delivered in the two-terminal network G(V, E, :⃗).
Definition 3. The state vector :⃗ is a minimal path vector to level d, d-MP, if and only if the two-terminal flow network G(V, E, :⃗) has a maximum flow d, and for each >⃗ ≤ :⃗, the two-terminal flow network G(V, E, >⃗) has a maximum flow less or equal to d.
Theorem 1. The state vector :⃗ is a d-MP for the two-terminal flow network G(V, E, c) iff :⃗ corresponds to a flow function with flow d and the corresponding graph G(V, E, :⃗) has no cycles [7, 8].
Corollary 1. Each 0-MP is a sum of simple cycles <@< Corollary 2. If :⃗ is a flow vector for flow d and <@< >⃗ = :⃗ − ∑A @<<⃗ is a d-MP. Example 1. Let us consider the graph given on Fig. 1 a). This graph contains cycles, and one of those cycles is . The flow through that cycle is equal to 2. We can take out that flow from the graph, thus obtain the graph presented on Fig. 2 b). The obtained graph contains a cycle again, , and by removing this cycle we will obtain the graph given on Fig. 2. If we order the nodes as 89 89 ICT Innovations 2019 Web Proceedings ISSN 1857-7288 a) b) Fig. 1. The graph from Example 1. a) is a cycle with capacity 2, that should be re- moved from the graph. b) The graph obtained from a). Fig. 2. Acyclic graph obtained by removing cycles from the graph shown on Fig 1. 3 Linearization of Genome Sequence Graphs Genome Sequence Graph (GSG), Fig. 3, is a representation of DNA sequences that helps to incorporate human genetic variation into the reference human genome (trans- locations, inversions, deletions, and insertions.) [9]. Fig. 3. Genome sequence graph for CATCGCT, CATCGT, CAGCGAT and CATCGAGAGAGCT. One of the most important operations in GSGs is graph linearization, essential both for efficiency of storage and access, as well as for natural graph visualization and compatibility with other tools. The linearization of a sequence graph aims to lessen the total weight of all feedback arcs, i.e. the weighted feedback, as much as possible. In the rest of the paper we will denote linear ordered set of nodes by 90 90 ICT Innovations 2019 Web Proceedings ISSN 1857-7288 source to it, with a weight equal to the number of genomes that start with it. For each node that represents an ending DNA base, we add a weighted arc from that node to the sink. The weight of that arc is equal to the number of genomes that end with that node, Fig. 4. Fig. 4. Modification of the initial DNA graph by adding source and sink. This way we obtain a graph whose weight function is in fact a flow function, be- cause the total flow out of the source is equal to the total flow entering the sink, and for all other nodes, total flow leaving a node is equal to the total flow entering a node [7, 8]. In fact, such a graph represents a network flow function with flow equal to the number of genome sequences. The max-flow, M, of this network is equal to the total flow out of the source, so any minimal path vector for level M, i.e. minimal flow function for flow M, is an undirected graph. The features of these minimal paths can help in minimizing the weighted feedback of the graph. 4 Total Weight Feedback Arcs and min-paths The first main new theorem in this part says that if some ordering of the nodes corre- sponds to the minimal weight of feedback edges, then this ordering also corresponds to a topological sort of some minimal path. Theorem 2. Let G(V, E, @) be a flow graph with flow d, where |V| = n, and let Proof. Assume that there is an ordering of nodes with a minimal weight of feedback edges such that the flow in the graph after removing forward edges is d’< d. In the graph obtained from G by removing feedback edges, we can find some d’-MP :⃗. It is obvious that the graph G’(V, E, @-x) is a flow graph of level d – d’ and that any flow of level d – d’ uses a collection of feedback edges (vj1, vi1), (vj2, vi2), …, (vjk, vik), is a collection of feedback edges with weights d1, d2,…, d1k. Moreover, if we create a new graph by adding a source s’ and a sink t’ such that G’’(V’={{min{vim},…, max{vjm}}, E/ V’È{s’,t’}, @/V’È{s’,t’},), Where @/V’È{s’,t’} is: {c(uk,vl)| k, the graph on Fig 2 has not forward edges. Moreover, if we order the nodes of the graph on Fig 1 a), the total weight of the feed- back edges will be 3, which is the minimal possible weighted feedback of this graph. be an ordering of nodes with a minimal weight of feedback edges. Then there is a d-MP :⃗ for which