Sonja Gievska • Gjorgji Madjarov (Eds.)

ICT Innovations 2019 Big Data Processing and Mining 11th International Conference, ICT Innovations 2019 Ohrid, North Macedonia, October 17–19, 2019 Proceedings

123 Preface

The ICT Innovations conference series, organized by the Macedonian Society of Information and Communication Technologies (ICT-ACT) is an international forum for presenting scientific results related to innovative fundamental and applied research in ICT. The 11th ICT Innovations 2019 conference that brought together academics, students, and industrial practitioners, was held in Ohrid, Republic of North Macedonia, during October 17–19, 2019. The focal point for this year’s conference was “Big Data Processing and Mining,” with topics extending across several fields including social network analysis, natural language processing, deep learning, sensor network analysis, bioinformatics, FinTech, privacy, and security. Big data is heralded as one of the most exciting challenges in data science, as well as the next frontier of innovations. The spread of smart, ubiquitous computing and social networking have brought to light more information to consider. Storage, integration, processing, and analysis of massive quantities of data pose significant challenges that have yet to be fully addressed. Extracting patterns from big data provides exciting new fronts for behavioral analytics, predictive and prescriptive modeling, and knowledge discovery. By leveraging the advances in deep learning, stream analytics, large-scale graph analysis and distributed data mining, a number of tasks in fields like, biology, games, robotics, commerce, transportation, and health care have been brought within reach. Some of these topics were brought to the forefront of the ICT Innovations 2019 conference. This book presents a selection of papers presented at the conference which contributed to the discussions on various aspects of big data mining (including algo- rithms, models, systems, and applications). The conference gathered 184 authors from 24 countries reporting their scientific work and solutions in ICT. Only 18 papers were selected for this edition by the international Program Committee, consisting of 176 members from 43 countries, chosen for their scientific excellence in their specific fields. We would like to express our sincere gratitude to the authors for sharing their most recent research, practical solutions, and experiences, allowing us to contribute to the discussion on the trends, opportunities, and challenges in the field of big data. We are grateful to the reviewers for the dedicated support they provided to our thorough reviewing process. Our work was made easier by following the procedures developed and passed along by Prof. Slobodan Kaljadziski, the co-chair of the ICT Innovations 2018 conference. Special thanks to Ilinka Ivanoska, Bojana Koteska, and Monika Simjanoska for their support in organizing the conference and for the technical preparation of the conference proceedings.

October 2019 Sonja Gievska Gjorgji Madjarov Organization

Conference and Program Chairs

Sonja Gievska University Ss.Cyril and Methodius, North Macedonia Gjorgji Madjarov University Ss.Cyril and Methodius, North Macedonia

Program Committee

Jugoslav Achkoski General Mihailo Apostolski Military Academy, North Macedonia Nevena Ackovska University Ss.Cyril and Methodius, North Macedonia Syed Ahsan Technische Universität Graz, Austria Marco Aiello University of Groningen, The Netherlands Azir Aliu Southeastern European University of North Macedonia, North Macedonia Luis Alvarez Sabucedo Universidade de Vigo, Spain Ljupcho Antovski University Ss.Cyril and Methodius, North Macedonia Stulman Ariel The Jerusalem College of Technology, Israel Goce Armenski University Ss.Cyril and Methodius, North Macedonia Hrachya Astsatryan National Academy of Sciences of Armenia, Armenia Tsonka Baicheva Bulgarian Academy of Science, Bulgaria Verica Bakeva University Ss.Cyril and Methodius, North Macedonia Antun Balaz Institute of Physics Belgrade, Serbia Lasko Basnarkov University Ss.Cyril and Methodius, North Macedonia Slobodan Bojanic Universidad Politécnica de Madrid, Spain Erik Bongcam-Rudloff SLU-Global Bioinformatics Centre, Sweden Singh Brajesh Kumar RBS College, India Torsten Braun University of Berne, Switzerland Andrej Brodnik University of Ljubljana, Slovenia Francesc Burrull Universidad Politécnica de Cartagena, Spain Neville Calleja University of Malta, Malta Valderrama Carlos UMons University of Mons, Belgium Ivan Chorbev University Ss.Cyril and Methodius, North Macedonia Ioanna Chouvarda Aristotle University of Thessaloniki, Greece Trefois Christophe University of Luxembourg, Luxembourg Betim Cico Epoka University, Albania Emmanuel Conchon Institut de Recherche en Informatique de Toulouse, France Robertas Damasevicius Kaunas University of Technology, Lithuania Pasqua D’Ambra IAC, CNR, Italy Danco Davcev University Ss.Cyril and Methodius, North Macedonia Antonio De Nicola ENEA, Italy viii Organization

Domenica D’Elia Institute for Biomedical Technologies, Italy Vesna Dimitrievska University Ss.Cyril and Methodius, North Macedonia Ristovska Vesna Dimitrova University Ss.Cyril and Methodius, North Macedonia Ivica Dimitrovski University Ss.Cyril and Methodius, North Macedonia Salvatore Distefano University of Messina, Italy Milena Djukanovic University of Montenegro, Montenegro Ciprian Dobre University Politehnica of Bucharest, Romania Martin Drlik Constantine the Philosopher University in Nitra, Slovakia Sissy Efthimiadou ELGO Dimitra Agricultural Research Institute, Greece Tome Eftimov Stanford University, USA, and Jožef Stefan Institute, Slovenia Stoimenova Eugenia Bulgarian Academy of Sciences, Bulgaria Majlinda Fetaji Southeastern European University of North Macedonia, North Macedonia Sonja Filiposka University Ss.Cyril and Methodius, North Macedonia Predrag Filipovikj Mälardalen University, Sweden Ivan Ganchev University of Limerick, Ireland Todor Ganchev Technical University Varna, Bulgaria Nuno Garcia Universidade da Beira Interior, Portugal Andrey Gavrilov Novosibirsk State Technical University, Russia Ilche Georgievski University of Groningen, The Netherlands John Gialelis University of Patras, Greece Sonja Gievska University Ss.Cyril and Methodius, North Macedonia Hristijan Gjoreski University Ss.Cyril and Methodius, North Macedonia Dejan Gjorgjevikj University Ss.Cyril and Methodius, North Macedonia Danilo Gligoroski Norwegian University of Science and Technology, Norway Rossitza Goleva Technical University of Sofia, Bulgaria Andrej Grgurić Ericsson Nikola Tesla d.d., Croatia David Guralnick International E-Learning Association (IELA), France Marjan Gushev University Ss.Cyril and Methodius, North Macedonia Elena Hadzieva University St. Paul the Apostle, North Macedonia Violeta Holmes University of Hudersfield, UK Ladislav Huraj University of SS. Cyril and Methodius, Slovakia Sergio Ilarri University of Zaragoza, Spain Natasha Ilievska University Ss.Cyril and Methodius, North Macedonia Ilija Ilievski Graduate School for Integrative Sciences and Engineering, Singapore Mirjana Ivanovic University of Novi Sad, Serbia Boro Jakimovski University Ss.Cyril and Methodius, North Macedonia Smilka Janeska-Sarkanjac University Ss.Cyril and Methodius, North Macedonia Mile Jovanov University Ss.Cyril and Methodius, North Macedonia Milos Jovanovik University Ss.Cyril and Methodius, North Macedonia Organization ix

Zucko Jurica Faculty of Food Technology and Biotechnology, Croatia Slobodan Kalajdziski University Ss.Cyril and Methodius, North Macedonia Kalinka Kaloyanova FMI-University of Sofia, Bulgaria Aneta Karaivanova Bulgarian Academy of Sciences, Bulgaria Mirjana Kljajic Borstnar University of Maribor, Slovenia Ljupcho Kocarev University Ss.Cyril and Methodius, North Macedonia Dragi Kocev Jožef Stefan Institute, Slovenia Margita Kon-Popovska University Ss.Cyril and Methodius, North Macedonia Magdalena Kostoska University Ss.Cyril and Methodius, North Macedonia Bojana Koteska University Ss.Cyril and Methodius, North Macedonia Ivan Kraljevski VoiceINTERconnect GmbH, Germany Andrea Kulakov University Ss.Cyril and Methodius, North Macedonia Arianit Kurti Linnaeus University, Sweden Xu Lai Bournemouth University, UK Petre Lameski University of Ss. Cyril and Methodius, North Macedonia Suzana Loshkovska University Ss.Cyril and Methodius, North Macedonia José Machado Da Silva University of Porto, Portugal Ana Madevska Bogdanova University Ss.Cyril and Methodius, North Macedonia Gjorgji Madjarov University Ss.Cyril and Methodius, North Macedonia Yousef Malik Zefat Academic College, Israel Tudruj Marek Polish Academy of Sciences, Poland Ninoslav Marina University St. Paul the Apostole, North Macedonia Smile Markovski University Ss.Cyril and Methodius, North Macedonia Marcin Michalak Silesian University of Technology, Poland Hristina Mihajlovska University Ss.Cyril and Methodius, North Macedonia Marija Mihova University Ss.Cyril and Methodius, North Macedonia Aleksandra Mileva University Goce Delcev, North Macedonia Biljana Mileva Boshkoska Faculty of Information Studies in Novo Mesto, Slovenia Georgina Mirceva University Ss.Cyril and Methodius, North Macedonia Miroslav Mirchev University Ss.Cyril and Methodius, North Macedonia Igor Mishkovski University Ss.Cyril and Methodius, North Macedonia Kosta Mitreski University Ss.Cyril and Methodius, North Macedonia Pece Mitrevski University St. Kliment Ohridski, North Macedonia Irina Mocanu University Politehnica of Bucharest, Romania Ammar Mohammed Cairo University, Egypt Andreja Naumoski University Ss.Cyril and Methodius, North Macedonia Manuel Noguera University of Granada, Spain Thiare Ousmane Gaston Berger University, Senegal Eleni Papakonstantinou Genetics Lab, Greece Marcin Paprzycki Polish Academy of Sciences, Poland Dana Petcu West University of Timisoara, Romania Antonio Pinheiro Universidade da Beira Interior, Portugal Matus Pleva Technical University of Košice, Slovakia Florin Pop University Politehnica of Bucharest, Romania x Organization

Zaneta Popeska University Ss.Cyril and Methodius, North Macedonia Aleksandra University Ss.Cyril and Methodius, North Macedonia Popovska-Mitrovikj Marco Porta University of Pavia, Italy Ustijana Rechkoska University of Information Science and Technology Shikoska St. Paul The Apostle, North Macedonia Manjeet Rege University of St. Thomas, USA Bernd Rinn ETH Zurich, Switzerland Blagoj Ristevski University St. Kliment Ohridski, North Macedonia Sasko Ristov University Ss.Cyril and Methodius, North Macedonia Witold Rudnicki University of Białystok, Poland Jelena Ruzic Mediteranean Institute for Life Sciences, Croatia David Šafránek Masaryk University, Czech Republic Simona Samardjiska University Ss.Cyril and Methodius, North Macedonia Wibowo Santoso Central Queensland University, Australia Snezana Savovska University St. Kliment Ohridski, North Macedonia Loren Schwiebert Wayne State University, USA Xu Shuxiang University of Tasmania, Australia Vladimir Siládi Matej Bel University, Slovakia Josep Silva Universitat Politècnica de València, Spain Ana Sokolova University of Salzburg, Austria Michael Sonntag Johannes Kepler University Linz, Austria Dejan Spasov University Ss.Cyril and Methodius, North Macedonia Todorova Stela University of Agriculture, Bulgaria Goran Stojanovski Elevate Global, North Macedonia Biljana Stojkoska University Ss.Cyril and Methodius, North Macedonia Ariel Stulman The Jerusalem College of Technology, Israel Spinsante Susanna Università Politecnica delle Marche, Italy Ousmane Thiare Gaston Berger University, Senegal Biljana Tojtovska University Ss.Cyril and Methodius, North Macedonia Yalcin Tolga NXP Labs, UK Dimitar Trajanov University Ss.Cyril and Methodius, North Macedonia Ljiljana Trajkovic Simon Fraser University, Canada Vladimir Trajkovik University Ss.Cyril and Methodius, North Macedonia Denis Trcek University of Ljubljana, Slovenia Christophe Trefois University of Luxembourg, Luxembourg Kire Trivodaliev University Ss.Cyril and Methodius, North Macedonia Katarina Trojacanec University Ss.Cyril and Methodius, North Macedonia Hieu Trung Huynh Industrial University of Ho Chi Minh City, Vietnam Zlatko Varbanov Veliko Tarnovo University, Bulgaria Goran Velinov University Ss.Cyril and Methodius, North Macedonia Elena Vlahu-Gjorgievska University of Wollongong, Australia Irena Vodenska Boston University, USA Katarzyna Wac University of Geneva, Switzerland Yue Wuyi Konan University, Japan Zeng Xiangyan Fort Valley State University, USA Organization xi

Shuxiang Xu University of Tasmania, Australia Rita Yi Man Li Hong Kong Shue Yan University, Hong Kong, China Malik Yousef Zefat Academic College, Israel Massimiliano Zannin INNAXIS Foundation Research Institute, Spain Zoran Zdravev University Goce Delcev, North Macedonia Eftim Zdravevski University of SS. Cyril and Methodius, North Macedonia Vladimir Zdravevski University Ss.Cyril and Methodius, North Macedonia Katerina Zdravkova University Ss.Cyril and Methodius, North Macedonia Jurica Zucko Faculty of Food Technology and Biotechnology, Croatia Chang Ai Sun University of Science and Technology Beijing, China Yin Fu Huang University of Science and Technology, Taiwan Suliman Mohamed Fati INTI International University, Malaysia Hwee San Lim University Sains Malaysia, Malaysia Fu Shiung Hsieh University of Technology, Taiwan Zlatko Varbanov Veliko Tarnovo University, Bulgaria Dimitrios Vlachakis Genetics Lab, Greece Boris Vrdoljak University of Zagreb, Croatia Maja Zagorščak National Institute of Biology, Slovenia

Scientific Committee

Danco Davcev University Ss.Cyril and Methodius, North Macedonia Dejan Gjorgjevikj University Ss.Cyril and Methodius, North Macedonia Boro Jakimovski University Ss.Cyril and Methodius, North Macedonia Aleksandra University Ss.Cyril and Methodius, North Macedonia Popovska-Mitrovikj Sonja Gievska University Ss.Cyril and Methodius, North Macedonia Gjorgji Madjarov University Ss.Cyril and Methodius, North Macedonia

Technical Committee

Ilinka Ivanoska University Ss.Cyril and Methodius, North Macedonia Monika Simjanoska University Ss.Cyril and Methodius, North Macedonia Bojana Koteska University Ss.Cyril and Methodius, North Macedonia Martina Toshevska University Ss.Cyril and Methodius, North Macedonia Frosina Stojanovska University Ss.Cyril and Methodius, North Macedonia

Additional Reviewers

Emanouil Atanassov Emanuele Pio Barracchia Table of Contents

ICT Innovations 2019

Performance Analysis of ECG Processing on Mobile Devices ...... 1 Monika Simjanoska, Bojana Koteska, Marjan Gusev, Ana Madevska Bogdanova

Public NTP Stratum 2 Server for Time Synchronization of Devices and Applications in Republic of North Macedonia ...... 16 Bogdan Jeliskoski, Veno Pachovski, Dobre Blazevski, Irena Stojmen- ovska

Risk analysis and risk management in IT industry...... 31 Aleksandar Kovachev, Vesna Dimitrova

A Review-based Survey of Mobile Student Service ...... 42 Sasho Gramatikov, Aleksandar Spasov, Ana Gjorgjevikj, Dimitar Tra- janov

Rating and Geo-Mapping of Travel Agencies that Provide Vacations to Long-Distance Destinations from Skopje, North Macedonia ...... 58 Dina Gitova, Elena M. Jovanovska, Andreja Naumoski

Using continuous integration principles for managing virtual laboratories .. 67 Dejan Stamenov, Dimitar Venov, Goran Petkovski, Boro Jakimovski, Goran Velinov

Visualization of the nets of type of Zaremba-Halton constructed in gener- alized number system? ...... 76 Vesna Dimitrievska Ristovska, Vassil Grozdanov, Tsvetelina Petrova

Minimizing Total Weight of Feedback Arcs in Flow Graphs ...... 87 Marija Mihova, Ilinka Ivanoska, Kristijan Trifunovski, Kire Trivo- daliev

HPM-CFSI: A High-Performance Algorithm for Mining Closed Frequent Significance Itemset ...... 98 Huan Phan, Bac Le

Object Detection Using Synthesized Data ...... 110 Matija Buric, Goran Paulin, Marina Ivasic-Kos

Comparing Databases for Inserting and Querying Jsons for Big Data .....125 Panche Ribarski, Bojan Ilijoski, Biljana Tojtovska SHELDON Workshop - Solutions for ageing well at home

Telepresence mobile robot platform for elderly car ...... 137 Natasa Koceska, Saso Koceski

BIM and AAL ...... 140 Volker Koch, Maria Victoria Gmez Gmez, Petra von Both

Interactive textiles for designing adaptive space for elderly ...... 143 Amine Haj Taieb, Amal Lajmi

Solutions in built environment to prevent social isolation ...... 145 Veronika Kotradyova

A short history of IoT architectures through real AAL Cases ...... 153 Pedro C. Lpez-Garca, Andrs L. Bleda-Toms, Miguel ngel Beteta-Medina, Jose Arturo Vidal-Poveda, Francisco J. Melero-Muoz, Rafael Maestre- Ferriz

Resource Allocation using network Centrality in the AAL environment ...157 Katerina Papanikolaou, Constandinos Mavromoustakis, Ciprian Dobre

Framework for the development of mobile AAL applications ...... 161 Katerina Papanikolaou, Constandinos Mavromoustakis, Ciprian Dobre SHELDON Workshop - Solutions for ageing well in the community

The use of Technology to promote activity and communication by the municipality services, examples from an urban area ...... 164 Birgitta Langhammer, Ainta Woll, Jeppe Skjnberg

Thermal environment assessment and adaptative model analysis in elderly centers in the Atlantic climate ...... 169 Pedro Torres, Joana Madureira, Lvia Aguiar, Cristiana Pereira, Ar- manda Teixeira-Gomes, Maria Paula Neves, Joo Paulo Teixeira, Ana Mendes

Climate change in the city and the elderlies ...... 172 Emanuele Naboni

Loneliness is not simply being alone. How understand the loneliness in ageing to create smart environments ...... 175 Lucia Gonzlez Lpez, Ana Perandrs Gmez, Alfonso Cruz Lendnez

From Co-design to Industry: investigating both ends of the scale of devel- oping Social Robots ...... 178 Kasper Rodil, Michal Isaacson Thermal comfort perception in elderly care centers. Case study of Lithuania181 Lina Seduikyte, Mantas Dobravalskis, Ugn Didiariekyt

Human machine interaction - Assessment of seniors characteristics and its importance for predicting performance patterns ...... 184 Karel Schmeidler, Katerina Marsalkova

Potential biomarkers and risk factors associated with frailty syndrome in older adults preliminary results from the BioFrail study ...... 187 Armanda Teixeira-Gomes, Solange Costa, Bruna Lage, Susana Silva, Vanessa Valdiglesias, Blanca La↵on, Joo Paulo Teixeira

Consciousness Society creation. Stage development ...... 189 Dumitru Todoroi

Attempts to improve the situation of elderly people in the Republic of Moldova ...... 192 Dumitru Todoroi SHELDON Workshop - Solutions for ageing well at work

Quantitative analysis of publications on policies for ageing at work ...... 195 Jasmina Barakovi Husi, Sabina Barakovi, Petre Lameski, Eftim Zdravevski, Vladimir Trajkovik, Ivan Chorbev, Igor Ljubi, Petra Mareova, Ondrej Krejcar, Signe Tomsone

Can deep learning contribute to healthy aging at work? ...... 198 Petre Lameski, Eftim Zdravevski, Ivan Chorbev, Rossitza Goleva, Nuno Pombo, Nuno M. Garcia, Ivan Pires, Vladimir Trajkovik

Work and social activity for longer healthy life ...... 201 Lilia Raycheva

Reverse mentoring as an alternative strategy for lifelong learning in the workplace ...... 204 Neka Sajini, Andreja Isteni Stari, Anna Sandak

What users want research on workers, employers and caregivers demands on SmartWork AI system ...... 207 Willeke van Staalduinen, Carina Dantas, Sofia Ortet

Healthy aging at work: Trends in scientific publications ...... 210 Eftim Zdravevski, Petre Lameski, Ivan Chorbev, Rossitza Goleva, Nuno Pombo, Nuno M. Garcia, Ivan Pires, Vladimir Trajkovik SHELDON Workshop - Technology Adoption

Challenges in building AI-based connected health services for older adults . 213 Hrvoje Belani, Igor Ljubi, Kristina Drusany Stari Connected health and wellbeing for people with complex communication needs ...... 216 Hrvoje Belani, Kristina Drusany Stari, Igor Ljubi Analysis on competences and needs of senior citizens related to domotics ..219 Tamara Suttner, Paul Held

Comparison of technology adoption models ...... 222 Agnieszka Landowska Choice architecture to promote the adoption of new technologies ...... 225 Dean Lipovac, Michael D. Burnard Swallowing the bitter pill: Emotional barriers to the adoption of medicine compliance systems among older adults ...... 228 Vered Shay, Kuldar Taveter, Michal Isaacson An edge computing robot experience for automatic elderly mental health care based on voice ...... 231 C. Yvano↵-Frenchin, V. Ramos, T. Belabed, C. Valderrama EMBNet Workshop Towards Integration Exposome Data and Personal Health Records in the Age of IoT ...... 237 Savoska Snezana, Blagoj Ristevski, Natasha Blazheska Tabakovska, Ilija Jolevski Complex Network Analysis and Big Omics Data ...... 247 Blagoj Ristevski, Snezana Savoska, Pece Mitrevski Big Data analysis and reproducibility: new challenges and needs of the post-genomics era ...... 253 Aspasia Efthimiadou, Eleni Papakonstantinou, Dimitrios Vlachakis, Domenica DElia, Erik Bongcam-Rudlo↵

Author Index ...... 258 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

Performance Analysis of ECG Processing on Mobile Devices

Monika Simjanoska1[0000−0002−5028−3841], Bojana Koteska1[0000−0001−6118−9044], Marjan Gusev1[0000−0003−0351−9783], and Ana Madevska Bogdanova1[0000−0002−0906−3548]

Faculty of Computer Science and Engineering, Ss. Cyril and Methodius University Rugjer Boshkovik 16, PO Box 393, 1000 Skopje, Macedonia {monika.simjanoska,bojana.koteska, marjan.gushev,ana.madevska.bogdanova}@finki.ukim.mk

Abstract. Recently, there is a great focus on preventive medicine and consequently, lots of telemedicine applications have been developed for real-time health monitoring. The monitoring requires a set of biosensors and tablets, smartphones or other mobile devices. The biosensors contin- uously scan and transmit data and the mobile devices have the capacity to process the data and at the same time to save the battery as much as possible. In this paper, we assess the performance of an algorithm for ECG derived heart rate and respiratory rate. Three different scenar- ios are inspected encompassing local, remote server, and remote MatLab⃝c server processing of the locally collected ECG signals by using the biosensors attached to a person. We set a hypothesis that processing locally instead of remotely is worse in terms of time and power con- sumption. Thus, developing a web service for processing will be best in both time and battery consumption. A total of 1297 signals havebeen processed. The results show that the mobile device is appropriate for processing ECG files up to approximately 5MB size. The processingof larger files spends too many resources when compared to the remote pro- cessing solution. Additionally, we have proved that splitting large files in chunks up to 4 MB per chunk resulted in faster processing for the local scenario.

Keywords: ECG · Algorithm · Signal processing · Mobile Platform.

1 Introduction

Preventive rather than curative medicine has brought small laboratoriesinto people’s own devices. Biosensors are no longer high-cost equipment available only to the hospital units or to the great research centers. Instead, there is a trend of developing smaller units to measure the persons’ vital parameters without degrading their comfortability. Those biosensors are able to capture the persons’ signals from various sources such as heart electrical activity (ECG), heart sounds (PCG), oxygen saturation level (PPG), brain electrical activity

1 1 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

(EEG), etc. The signals are continuously transmitted and consequently need to be processed in real-time to extract reliable data of the current health condition. ECG is one of the most popular physiological signals that can accurately provide an insight into the individual’s heart rate, respiratory rate, and even blood pressure (mostly combined with other signals). This ability makes the ECG to be widely used in cases of rapid diagnosis when no other equipment is available, meaning it is suitable to use a mobile device as a processing unit [10]. However, the up-to-date mobile devices still lack the storage and processing power of the ”big” computers and therefore, we must take into account thelimits of the storage, processing, and battery when creating algorithms for real-time ECG signals processing. Usually, the biosensors do not have their own memory and these data are sent to a portable electronic device located close to the sensor (for ex. mobile phone or tablet). A mobile device can receive real-time data from several biosensors attached to a patient and are used to monitor the health condition. This is a typical case of measuring vital parameters when a patient wears a portable biosensor that monitors ECG. The final goal is to monitor the health condition and to save the phone battery as much as possible. In order to provide an accurate and fast algorithm performance preserving a long battery life, the huge dilemma is where to execute the algorithm: locally on the mobile device or on a remote server. For example, if a patient wantsto check the heart rate and respiratory rate after every 30 minutes, the dilemma is: to send the ECG data from the customer’s phone to a remote server and execute the algorithm remotely or to execute the algorithm on the customer’s phone. We set a hypothesis that local processing performs worse than remote Linux server and remote MatLab⃝c processing in terms of storage, computation, and consequently battery consumption. It is expected that developing a web service will provide faster processing time; however, the effect of the server response times on the energy consumption of mobile clients [16] still has to be investigated. In order to confirm the hypothesis, we conducted an experiment to evaluate the algorithm’s performance and battery consumption of the mobile device. Three scenarios are included: local processing on a mobile device, own remote Linux-hosted web service, and a remote MatLab⃝c -hosted service. Further on, we set up a research problem to find out the most time effi- cient and battery efficient way for real-time processing of an ECG signal and calculating ECG derived heart rate and respiratory rate. The experiment includes testing of various ECG signal lengths from three dif- ferent databases relied on different ECG sources in terms of sampling frequency, including Cooking Hacks e-Health Sensor Platform [11], Savvy Sensor Platform [19] and Physionet [23]. The rest of the paper is organized as follows. Related work in the area of ECG processing algorithms designed for portable devices is given in Section 2. Section 3 presents the testing methodology, plan and infrastructure. The results from the experiments are described in Section 4. Section 5 is dedicated to a discussion of the outcome and time processing trade-off, comparison of theresults

2 2 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

and analysis which environment provides the best processing time and longest battery life. The conclusion and future work are specified in Section 6.

2RelatedWork

Parak and Havlik stress the importance of developing an approach that saves computing time, aiming to achieve suitable microprocessor implementation [17]. Elgendi at al. [7] recommend a first derivate of the flittered ECG signal for long-term processing of ECG recordings and ECG large databases since it is highly numerically efficient for QRS detection phase. Elgendi presented a mobile phone application for QRS complex identification out of large recorded ECG signals and achieves a processing time of 2.2 seconds for an ECG signal of 130 minutes [6]. A novel mobile-friendly approach for detection of QRS complexes from a high fidelity ECG real-time monitoring is provided in [2]. The authors used continuous wavelet transform which resulted in lesser computation time compared to Pan- Tompkins algorithm. Li at al. [14] proposed a low power and efficient QRS processor for real-time ECG monitoring based on harr wavelet transform and optimized FIR filter structure. Power consumption is only 410 nW in 1 V voltage supply. Lahiri et al. proposed an algorithm for real-time compressing ECG wave- forms and send them through a mobile phone. They compare their algorithm performance with several existing compression algorithms and concluded that their algorithm can achieve good compression with low distortion on the re- ceiving side [13]. Gia at al. [8] analyzed ECG signals in smart gateways with wavelet transform mechanism. Their experimental results showed that fog com- puting helped to achieve more than 90% bandwidth efficiency and low-latency real-time response at the edge of the network. Mazomenos et al. [15] introduced a low-complexity algorithm for fiducial points from ECG designed for mobile healthcare applications. They have applied discrete wavelet transformation as a principal analysis method. Their algorithm was applied to 477 ECG recordings and achieved an ideal trade-off between computational complexity and performance. Ibaida et al. [12] proposed a compression algorithm to minimize the traffic going to and coming from cloud ECG compression. Their technique achieved a higher compression ratio of 40 with lower Percentage Residual Difference (PRD) Value less than 1%. Wang et al. [22] developed a cloud-based system for per- forming massive ECG data analysis. Their system showed up to 38x performance improvement and 142x energy efficiency improvement compared to commercial systems. A cloud system providing real-time ECG data acquisition, transmis- sion and analyzing over the Internet was presented in [24]. Evaluation of this system showed that it has good performance in terms of real-time, scalability and latency. A low-cost wireless ECG data acquisition system capable of acquiring up to 22 hours of continuous ECG data is described in [1]. The authors use Arduino

3 3 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

Uno and Bluetooth serial communication to real-time ECG data transfer. Their system is portable and battery powered. Wand et al. [21] proposed a mobile-cloud computational solution for ECG telemonitoring. Their approach enhances the mobile medical monitoring in terms of accuracy, execution and energy efficiency. Pineda et al. [18] presented a portable system that enables the acquisition, processing, storage, and transmission of ECG signal. Their system can record ECG data over 120 hours in local mode, up to 10 days of ECG data could be stored in the phone in the cellular mode. For the remote signal visualization, the average delay of the packets was less than 1.73ms, with a mean power consump- tion of 0.48 w/h, using a battery of 2 A/h.

M ECG signals Process

Push ECG files ECG signals Process L Results

W ECG signals Send ECG files ECG files

Function call Get ECG files Results

Process

Fig. 1. The three architectural scenarios to conduct the testing

4 4 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

3 Testing Methodology

This section presents the testing environments, test cases, and test data.

3.1 Test Goal

The test goal is to determine the most efficient scenario in terms of performance (execution time) and resource saving (battery) to execute an ECG-derived heart rate and respiratory rate algorithm. We implemented the ECG-derived heart rate and respiratory rate algorithm relying on the idea presented by Ding et al. [5]. The algorithm accepts two pa- rameters: a file path where the ECG signal is stored and the sampling frequency at which the signal is collected (this option is necessary since various biosensors work with different sampling rates). The pseudo code of the algorithm is presented in Fig. 2.

1: Input: signal, frequency 2: Output: HR, num respirations, RR 3: function H3R(signal[], frequency) 4: num of peaks = Pan Tompkins (signal, frequency) 5: seconds = round(length(signal)/freq) 6: HR = 1/(seconds/num of peaks) ∗ 60 7: for i =1tonumof peaks-1 do 8: k = kurtosis(signal[peak[i]:peak[i +1]) 9: smoothed k=smooth(k, rlowess) 10: RR num of peaks = findpeaks(smoothed k) 11: num respirations = length(RR num of peaks) 12: RR = 1/(seconds/breaths) ∗ 60

Fig. 2. A pseudo code of the ECG-derived heart rate and respiratory rate algorithm

3.2 Test Scenarios

Figure 1 presents the testing environment and the architecture of three scenarios, according to the location where the processing is realized.

– Local mobile-only scenario (M), where the algorithm is executed on the An- droid mobile device. The algorithm is implemented as a C source code by using the Android Native Development Kit (NDK). – Remote Linux server scenario (L) where a shell script is used to transfer the locally collected ECG signals stored on the mobile device to a Linux server to execute the algorithm on the server. In this scenario, the C algorithm source code is compiled and executed on the Linux machine. The shell script is executed on the Android phone by using an Android terminal application.

5 5 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

– Remote MatLab⃝c web service scenario (W),wheretheprocessingisper- formed on the recently developed MatLab⃝c Mobile application for Android and connected it to a Windows server with MatLab⃝c installed. The locally collected ECG signals stored on the mobile device have been automatically synced to the server and accessed by the MatLab⃝c based version of the algorithm. In this scenario, we used the originally developed algorithm’s MatLab⃝c version.

3.3 Testing Environment

We use Samsung Galaxy Tab S2 - tablet with OS Android 6.0 (Marshmallow) as a testing environment for the experiments. This tablet has 3GB RAM, 64GBinter- nal memory, Octa-core (4x1.9 GHz & 4x1.3 GHz) processor and non-removable Li-Ion 5870 mAh battery. In the scenario M, the algorithm is executed on the tablet, with a specially developed Android application. The algorithm C source code is added to the Android application by compiling it into a native library using Android Devel- opment Studio that Gradle can package with the APK file. Then, in the Java code, the functions in the native library can be called through the Java Native Interface (JNI). Our application only makes a call to the algorithm function by providing two parameters: the ECG recording file path from the tablet storage and the frequency (sampling rate at which the ECG signal is collected). The scenario L is based on sending the ECG recordings to a remote machine in a local computer network for the further processing. We have created a shell script that connects to the remote server, transfers the recording files to a remote server for further processing and stores the file transfer time stamp. In order to calculate the CPU time used by each process, we sum up the User and Systime. The remote server is a Linux machine (OS-CentOS 7.2) with octa-core processor and 8GB RAM. In order to execute the shell script on the Android tablet,we used the Android Terminal Emulator application which provides access to the Android’s built-in Linux command line shell. The scenario W uses computation offload by sending ECG recordings to a remote machine for the further processing. In this scenario, we used the Android MatLab⃝c Mobile application [20] and connect it to a Windows server where MatLab⃝c is being installed. The windows server (OS Windows 7) has a quad-core processor (1.80GHz) with 4GB RAM. We use Tonido Server personal cloud [4] which allows us to transfer the files from the tablet to the remote server. The ECG recordings stored on the tablet are automatically syncedto the server by the Android Tonido application [3]. Then, they are accessed by the MatLab⃝c based version of the algorithm running on the server. In this scenario, we used the originally developed algorithm’s MatLab⃝c version. Tablet battery consumption is evaluated by using the Android GSam Battery Monitor application [9]. The battery is fully charged at the beginning of each test scenario.

6 6 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

3.4 Test Cases

To cover variable lengths and sources of ECG, we used three different databases and a total of 1297 ECG recordings divided in the following categories:

1. Short term measurements - small files. A total of 1191 ECG recordings with varying length from 10 to 90 seconds at a sampling rate of 125Hz are created by using the Cooking Hacks e-Health Sensor Platform. This platform allows Arduino and Raspberry Pi to interact with the shield and be used for bio- metric and medical applications. The ECG Sensor in this set is a single-lead ECG which uses three ECG electrodes. These basic leads acquire enough information for rhythm-monitoring, but are insufficient for determination of ST elevation (since there is no lead that gives information about the anterior wall) [11]. 2. Medium term measurements - medium file length. A total of 22 ECG record- ings with varying length from 300 to 800 seconds at 125Hz are created by using the Savvy Sensor Platform. SAVVY ECG is a portable medical device for continuous and accurate monitoring of heart rhythm. It is a comprehen- sive and reliable solution for patients with cardiovascular diseases, athletes, the elderly and for people with an active lifestyle who want to monitor their own health [19]. 3. Large term measurements - large file length. A total of 84 ECG recordings with varying length from 2860 to 5170 seconds at 360Hz are obtained from the Massachusetts General Hospital-Marquette Foundation Hemodynamic and Electrocardiographic Database (MGH/MF) Physionet database [23].

Our test cases include files of different size from all databases: starting from 4KB up to 52KB from the first database, 59KB up to 9,571KB from the second database, and 8655KB up to 17273KB from the third database. A total of 1.65 GB data have been processed in each experiment.

3.5 Test Data

The measured data in the experiment is the execution time and battery con- sumption. Further on, we calculate the average algorithm time efficiencyperfile size and total algorithm time efficiency for all files. In order to calculate the average algorithm time efficiency, we split thefiles into categories according to the file size and find the average algorithm perfor- mance behavior. The total algorithm time efficiency is the sum of the algorithm time performances for all 1297 files. The average algorithm time efficiency and total algorithm time efficiency are calculated separately for each testing scenario. The total execution time is defined by (1) where Ti is the execution time for each file i and n is the total number of files.

n Ttotal = ! Ti (1) i=1

7 7 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

Tavg is average execution time for each unique file size in KB, defined by (2), where k is the number of files of equal file size s.

k T (s) T (s)="i=1 i (2) avg k

1

0.5

0

4 6 8 10 12 14 16 18 20 22 24 26 28 30 34 36 42 59 132 220 294 777 903 -0.5

-1

-1.5 Logaverage processing time -2

-2.5 File size (KB) M W L

-3

Fig. 3. Average execution times (small files category).

2.3

1.8

1.3

0.8

0.3 Logaverage processing time

-0.2 1246 1422 1657 2551 2825 2961 3284 4690 7478 7827 8655 M W L File size (KB) -0.7

Fig. 4. Average execution times (medium files category).

Note that the average times TavgM , TavgL and TavgW are calculated for each scenario M, L, and W correspondingly.

8 8 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

2.3

1.8

1.3

0.8

0.3 Logaverage processing time

-0.2 9544 9570 10450 10541 10889 10894 11504 11556 12210 12898 12909 12914 13479 14244 14317 14318 14554 14710 14769 14871 15253 15398 15459 15746 16096 16441 16624 17273 M W L File size (KB) -0.7

Fig. 5. Average execution times (large files category).

We assume the scenario M is the referent one and compare the results ob- tained for L and W scenarios. The relative speedup RP is a ratio of average execution times for each pair of test scenarios (3) by calculating RPML and RPMW correspondingly.

TavgM (s) TavgM (s) RPML = RPMW = (3) TavgL(s) TavgW (s)

Additionally, we calculate the speedup RPLW by (4) to compare the scenarios L and W.

TavgL(s) RPLW = (4) TavgW (s) The total battery consumption is calculated for 1297 processed files per each of the M, L and W scenario. The total battery consumption is defined by (5) where Bi is execution time for each file i and n is the total number of files.

n Btotal = ! Bi (5) i=1

4 Experimental Results

This section presents the results obtained from the defined testing scenarios.

4.1 Total performances

A total of 1.65GB data is processed and the summary results are provided in Table 1.

9 9 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

Table 1. Summary of the execution time and battery consumption.

Scenario Ttotal (sec) Btotal in (%) M 10551.06 91 W 7715.07 36 L 7524.89 30

The first column presents each scenario, the second shows total execution time per scenario, and the third represents the total battery consumption. The average execution time has been calculated for each of the distinct file sizes ranging from 4KB - 17MB. Analyzing the results, the scenario M is the worst case where the total time for processing the ECG files is 10551 seconds and 91% of the device battery is spent. Even though the processing times for scenarios W and L do not differ significantly (W performs 2.5% worse than L); however, the battery consumption at W case is ∼16% worse when compared to the L scenario. One can conclude that the ECG data transfer to a remote execution of the algorithm on the Linux server to be the fastest and most efficient scenarioin terms of battery consumption.

4.2 Dependence on file size

We have inspected the execution times in all three scenarios considering the size variability of the ECG files. To represent the very small execution times versus the huge values, we use a 10-base logarithmic normalization of the average execution times per distinct ECG file size. Considering the file sizes available for our research, we can clearly distinguish three categories which we refer to: a small files category containing files less than 1MB in size, a medium files category with sizes ≥ 1MB and ≤ 9MB, and a large files category with files ≥ 10MB. Figures 3, 4 and 5 present the normalized average execution times (y-axis) for each file size in each category (x-axis) correspondingly for the M, L and W scenarios. The figure clearly shows that the M scenario performance follows a logarith- mic increasing trend. Although it performs extremely better for filesizesupto 5MB, it is worse for the larger files sizes. Considering the L and W scenarios, it can be perceived that they both follow the same performance trends propor- tional to the rise of the file size. Even though the scenario M is ∼1.3 times faster than the scenarios W and L for files up to 5MB, it is ∼2 times slower for large files and considering the absolute values of the difference in times to be mea- sured in hundreds and thousands of seconds. The scenario M is not acceptable for unlimited ECG files sizes in terms of processing time.

10 10 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

1

4 6 8 -1 10 12 14 16 18 20 22 24 26 28 30 34 36 42 59 132 220 294 777 903

-3

-5 Logspeedup -7

-9 File size (KB) R(M,W) R(M,L) R(L,W) -11

Fig. 6. Relative speedup (small files category).

2

1.5

1

0.5

0

-0.5 1246 1422 1657 2551 2825 2961 3284 4690 7478 7827 8655 Logspeedup -1

-1.5

-2 File size (KB) R(M,W) R(M,L) R(L,W) -2.5

Fig. 7. Relative speedup (medium files category).

2

1.5

1

0.5

0

-0.5 9544 9570 10450 10541 10889 10894 11504 11556 12210 12898 12909 12914 13479 14244 14317 14318 14554 14710 14769 14871 15253 15398 15459 15746 16096 16441 16624 17273 Logspeedup -1

-1.5

-2 File size (KB) R(M,W) R(M,L) R(L,W) -2.5

Fig. 8. Relative speedup (large files category).

5 Discussion 5.1 Speedup analysis Figures 6, 7 and 8 present the relative speedup for all pairs of scenarios corre- spondingly. To represent the high range of the values we also use a logarithm scale with base 10.

11 11 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

Table 2. Comparison of processing times considering chunks.

File size #chunksChunk size TM (sec) TW (sec) TL (sec) 5MB 1 5MB 25.78 25.32 24.18 5MB 2 2.5MB 17.98 36.80 35.14 5MB 5.000 1KB 0.31 2505.90 2393.20 1GB 1 1GB 80954.00 290.47 277.40 1GB 2 500MB 56466.00 422.18 403.18 1GB 200 5MB 5156.80 5063.70 4835.80 1GB 250 4MB 4592.10 5711.50 5454.40 1GB 1.000.000 1KB 61.66 501190.00 478630.00 60GB 1 60GB 40785000.00 1914.10 1828.00 60GB 2 30GB 28448000.00 2782.10 2656.90 60GB 12.000 5MB 309410.00 303820.00 290150.00 60GB 15.000 4MB 275530.00 342690.00 327270.00 60GB 60.000.000 1KB 3699.60 30071000.00 28718000.00

One can observe that there is a linear speedup of M vs W and M vs L for file sizes approximately up to 5MB large. Above this size value, the speedup is stabilized and non-increasing. Analyzing the relative speedup of L vs. Wsce- nario, we can see that W is never better than L and the value is approximately 0, except for small range where L scenario is better, leading to a conclusion that there is no significant difference between the execution times.

5.2 File size dependence

Table 2 shows the processing times for the three scenarios would take 40785000 (M), 1914 (W) and 1828 (L) seconds. The processing times if we have divided the file in 2 chunks of 30GB each, would decrease to 28448000 seconds in the M scenario. However, it would increase to 2782 and 2656 seconds in W and L scenarios correspondingly. Considering 5 MB to be the threshold at which the processing times for all the three scenarios are expected to be nearly equal, dividing a 60GB file in 12000 chunks of 5MB each has resulted in the expected outcome of nearly same processing times. In order to prove that if the chunk size is less than 5MB we have split the large file into 15000 chunks of 4MB each and obtained fastest processing time for the M scenario. Even more, considering the smallest chunk size to be 1KB, the results also showed that the processing time is best in favor of the M scenario. Testing also for 1GB and 5MB file sizes we can conclude the following:

1. If there is no Internet connection available, files must be processed locally on the mobile device (scenario M) and should be split into chunks of the smallest size of ∼1KB; 2. If an Internet connection is available, files larger than 5MB should be sent to the remote web service (scenario L);

12 12 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

3. If an Internet connection is available, files less than 5MB should be split into chunks of the smallest size of ∼1KB and processed locally on the mobile device (scenario M).

6 Conclusion and Future Work

In this paper, we have analyzed the performance of ECG processing on mobile devices, including smartphones, tablets or similar devices. The analysis of the performance includes response time, battery consumption and resilience analysis of an algorithm for ECG derived heart rate and respira- tory rate. To make relevant conclusions we have realized three experiments for processing ECG files with various lengths, each of them providing a different execution environment. In the first experiment, files were processed locally, on a portable device. In the second experiment, files were processed remotely using a web service, and in the third files were processed remotely on the MatLab ⃝c server. The target goal was to analyze how to choose a more performance-effective and battery-effective solution. The hypothesis that the mobile-only solution per- forms worse than offloading solutions is valid only for files larger than 5MB. However, if a large file is split in chunks, then the mobile-only solution performs better. If the files were split into chunks which size is less than 5MB thenprocessing them locally is also acceptable even for large files. This conclusion does not apply to battery consumption, and in these cases, one needs to offload the computation in order to preserve the battery. We have modeled the performance behavior and compared the experimental and theoretical results. Considering the actual measurements, the theoretical (predicted) response time results have shown to follow the trend of the real response times. The experiment results solve the customer’s dilemma which option provides better performance and is more battery-effective, when processing ECG files with various size. We will continue to realize the same and similar experiments with different mobile devices.

References

1. Ahamed, M.A., Asraf-Ul-Ahad, M., Sohag, M.H.A., Ahmad, M.: Development of low cost wireless ECG data acquisition system. In: Advances in Electrical Engi- neering (ICAEE), 2015 International Conference on. pp. 72–75. IEEE (2015) 2. Amiri, A.M., Mankodiya, K., et al.: m-: An efficient QRS dQRSetection algorithm for mobile health applications. In: E-health Networking, Application & Services (HealthCom), 2015 17th International Conference on. pp. 673–676. IEEE (2015) 3. CodeLathe Technologies Inc: Tonido file access share sync, https://play.google.com/store/apps/details?id=com.tonido.android

13 13 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

4. CodeLathe Technologies Inc: Tonido server software, https://www.tonido.com/ 5. Ding, S., Zhu, X., Chen, W., Wei, D.: Derivation of respiratory signal from single- channel ECGs based on source statistics. International Journal of Bioelectromag- netism 6(2), 43–49 (2004) 6. Elgendi, M.: Fast QRS detection with an optimized knowledge-based method: Eval- uation on 11 standard ECG databases. PloS one 8(9), e73557 (2013) 7. Elgendi, M., Eskofier, B., Dokos, S., Abbott, D.: Revisiting QRS detection method- ologies for portable, wearable, battery-operated, and wireless ECG systems. PloS one 9(1), e84018 (2014) 8. Gia, T.N., Jiang, M., Rahmani, A.M., Westerlund, T., Liljeberg, P., Tenhunen, H.: Fog computing in healthcare internet of things: A case study on ECG feature ex- traction. In: Computer and Information Technology; Ubiquitous Computing and Communications; Dependable, Autonomic and Secure Computing; Pervasive In- telligence and Computing (CIT/IUCC/DASC/PICOM), 2015 IEEE International Conference on. pp. 356–363. IEEE (2015) 9. GSam Labs: Gsam battery monitor, https://play.google.com/store/apps/details?id =com.gsamlabs.bbm&hl=en 10. Gusev, M., Stojmenski, A., Guseva, A.: Ecgalert: A heart attack alerting system. In: International Conference on ICT Innovations. pp. 27–36. Springer (2017) 11. Hacks, C.: e-health sensor platform v2. 0 for arduino and raspberry pi [biomet- ric/medical applications] (2013) 12. Ibaida, A., Al-Shammary, D., Khalil, I.: Cloud enabled fractalbasedECGcom- pression in wireless body sensor networks. Future Generation Computer Systems 35,91–101(2014) 13. Lahiri, K., Mandal, A., Sinha, S., Mukherjee, A.: Wireless personal health ECG monitoring application. In: Wireless Personal Multimedia Communications (WPMC), 2014 International Symposium on. pp. 618–623. IEEE (2014) 14. Li, P., Jiang, H., Yang, W., Liu, M., Zhang, X., Hu, X., Pang, B., Yao, Z., Chen, H.: A 410-nw efficient QRS processor for mobile ECG monitoring in 0.18-µmcmos. In: Biomedical Circuits and Systems Conference (BioCAS), 2016 IEEE. pp. 14–17. IEEE (2016) 15. Mazomenos, E.B., Biswas, D., Acharyya, A., Chen, T., Maharatna, K., Rosen- garten, J., Morgan, J., Curzen, N.: A low-complexity ECG feature extraction al- gorithm for mobile healthcare applications. IEEE journal of biomedical and health informatics 17(2), 459–469 (2013) 16. Miettinen, A.P., Nurminen, J.K.: Energy efficiency of mobile clients in cloud com- puting. HotCloud 10,4–4(2010) 17. Par´ak, J., Havl´ık, J.: ECG signal processing and heart rate frequency detection methods. Proceedings of Technical Computing Prague 8,2011(2011) 18. Pineda-L´opez, F.M., Mart´ınez-Fern´andez, A., Rojo-Alvarez,´ J.L., Blanco-Velasco, M.: Design and optimization of an ECG/holter hybrid system for mobile systems based on dspic. In: Computing in Cardiology Conference (CinC), 2014. pp. 453– 456. IEEE (2014) 19. Saving: Monitor the activity of your heart, http://www.savvy.si 20. The MathWorks, Inc.: Matlab mobile, https://play.google.com/store/apps/details? id=com.mathworks.matlabmobile&hl=en 21. Wang, X., Gui, Q., Liu, B., Jin, Z., Chen, Y.: Enabling smart personalized health- care: a hybrid mobile-cloud approach for ECG telemonitoring. IEEE journal of biomedical and health informatics 18(3), 739–745 (2014)

14 14 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

22. Wang, X., Zhu, Y., Ha, Y., Qiu, M., Huang, T.: An fpga-based cloud system for massive ECG data analysis. IEEE Transactions on Circuits and Systems II: Express Briefs (2016) 23. Welch, J., Ford, P., Teplick, R., Rubsamen, R.: The massachusetts general hospital-marquette foundation hemodynamic and electrocardiographic database– comprehensive collection of critical care waveforms. Clinical Monitoring 7(1), 96–97 (1991) 24. Zhou, S., Zhu, Y., Wang, C., Gu, X., Yin, J., Jiang, J., Rong, G.: An fpga-assisted cloud framework for massive ECG signal processing. In: Dependable,Autonomic and Secure Computing (DASC), 2014 IEEE 12th International Conference on. pp. 208–213. IEEE (2014)

15 15 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

Public NTP Stratum 2 Server for Time Synchronization of Devices and Applications in Republic of North Macedonia

Bogdan Jeliskoski1, Veno Pachovski1, Dobre Blazevski1, Irena Stojmenovska1

1 University American College Skopje, III Makedonska brigada br. 60, 1000 Skopje, Republic of North Macedonia [email protected], {pachovski, dobre.blazevski, stojmenovska}@uacs.edu.mk

Abstract. With the development of new hardware, applications and Internet of Things (IoT) devices, accurate time synchronization is more than necessary. Having close geolocation Network Time Protocol (NTP) server for hardware devices or the applications needed to synchronize time, means accurate time synchronization, near the atomic time with fast server response. In this research paper, we explain creation of first public NTP Stratum 2 server in Republic of North Macedonia upon the latest VMware virtualization technology, which will serve the territory of Republic of North Macedonia and rest of Europe. NTP server will be satisfying all needed standards. Registration in official NTP worldwide register is already conducted and the server is available for all in the world. Keywords: network time synchronization NTP public virtualization VMware ESXi server networking

1. Introduction

The scope of the project is to create a stable and reliable public NTP Stratum 2 server for Republic of North Macedonia and rest of Europe. Currently there is no other NTP server dedicated for serving the region of Republic of North Macedonia. The main goal of this project would be better timing services to the clients, and to offload the utilization of the ISPs’ (Internet Service Providers) main links towards their global routing peers and main uplinks. Prior this project there was no other public NTP server available in the state, all personal computers, laptops, cellphones and other

16 16 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

devices and applications that are using SNTP/NTP were forced to use external NTP servers outside the state. With local NTP server publicly available, all the NTP connections would terminate at our NTP server, which would drastically improve the global network utilization and provide better time service for all clients. The NTP v4 protocol as defined by RFC-5905-Specification for protocol and algorithms is being used in this project, which is backward compatible with version 3. Version 4 of NTP protocol overcomes very significant and important deficiencies in design, compared to Version 3 of NTP, correcting specific errors that could occur, and involving new features in extended NTP time signatures, encouraging the use of fluctuating dual data types during the synchronization procedure [1]. As a result, the time resolution is better than a nanosecond, and the resolution of the frequency is less than one nanosecond per second [1]. Additional improvements include a new clock discipline algorithm that is more responsive to the hardware's frequency of the system. Typical primary servers using modern machines are precise within a few dozen microseconds. Typical secondary servers and clients on fast local networks are within a few hundred microseconds with intervals of up to 1024 [1]. With NTP Version 4, servers and clients are precise with polling intervals of up to 36 hours. For the purpose of this project we will first address NTP protocols and their security and then we will briefly introduce hierarchy of the SNTP Servers. Then we will explain our project setup including hardware requirements, installation and configuration of VMware ESXi Hypervisor, installation and configuration of Fedora 30 and backup, Network and firewall setup, configuration of advanced DNS settings for ntpmk.net hostname, as well as results from the testing.

2. Versions of NTP Protocol

First NTP implementation started in 1970 with accuracy of several hundred milliseconds. The first specifications were released in official RFC 778, but with different name - Internet Clock Service, where delay measurements of any host could be done by sending ICS datagram. The official introduction was in RFC 958 in 1985, where packets seeing on the networks are described and basic calculations were involved [2-4]. The Version 1 of NTP protocol was released in 1988 in RFC 1059. In this version, a symmetric operation and client-server mode was introduced. Version 1 was built on the IP protocol and User Datagram Protocol (UDP), to provide a connectionless transport from source to destination. It was designed for maintaining accuracy and robustness [2, 5]. Symmetric-key authentication with DES-CBC was introduced in the Version 2 of NTP protocol. DES is a symmetric key algorithm, with key size of 64 bits. This mechanism must derive a raw key with 64-bit value, and this process is not responsible for parity or weak key checks. Additional checking for weak keys and rejecting if found would be performed. Requesting for new SA would be done accordingly [6]. This version of NTP was officially published by IETF in RFC 1119 [2]. This RFC is obsoleted by the next generation of NTP, Version 3, described in RFC 1605 [6, 7].

17 17 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

Version 3 of the NTP protocol was published in 1992 in RFC 1305. In this version, introduction of esterror and maxerror formal correctness principles were specified and implemented, and revision of algorithms for optimized work combining the offsets of a number of peer timeservers was performed. In addition, broadcast mode was also implemented [2, 6]. A client sends an NTP request or message to one or more servers and replies are processed as received. The server will locally change addresses and ports and overwriting on some fields will be done in the message. Recalculation of checksum is done, and the message is returned to the client. With this message, a client can determine the server time and adjust the local clock. Information about calculation for choosing best servers and expected timekeeping accuracy and reliability are stored in the same message [2, 7]. New algorithm is presented in Version 3 of NTP to combine offsets of a number of peer time servers, modeled by the same one used by national standards laboratories for combining the weighted offsets from a number of standards clocks to construct a synthetic laboratory scale. This new algorithm is used for improving the accuracy, stability and reducing of errors, because of the asymmetric paths that exist in the Internet [7].

2.1 SNTP PROTOCOL VERSION 4

Version 4 is the latest version of the NTP protocol as defined in RFC 2030 named Simple Network Time Protocol (SNTP) Version 4 for IPv4 and OSI [8]. Protocol and algorithms are defined in RFC 5905. Reinterpretation of header fields is done in this version for support of IPv6 and OSI addressing model. In addition, optional anycast extensions are implemented for multicast service [8]. Main difference between Version 3 and Version 4 of NTP protocol is the interpretation of four octet reference identifier field, used for detection and avoid of synchronization loops [8]. SNTP Version 4 can work in unicast mode (point-to-point), anycast mode (multipoint to point) and multicast mode (point to multipoint) [8, 9]. In the Fig. 1, a common NTP format is presented, with all parameters.

Fig. 1. Common format of NTP packet [10]

18 18 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

As reference clock, the NTP is using Coordinated Universal Time (UTC) as a distinct from Greenwich Mean Time (GMT). International Atomic Time (TAI) time standard is used by UTC, and based on the measurements of 1 seconds as 9,192,631,770 periods of the radiation emitted by a caesium-133 atom in the transition between the two hyperfine levels of its ground state, meaning NTP protocol must implement leap second adjustments from time to time [10, 11]. Transmission delay can be calculated as total time from transmission of the chosen pool to reception of the response, minus recorded time for the server, processing the pool and to generate a response: δ = (T4 – T1) – (T3 – T2) [10]

Client’s offset clock from the server clock can be estimated: θ = ½ [(T2 – T1) + (T3 – T4)] [10]

2.2 Security Analysis of NTP Protocol

In 2016, the NTP servers due to security exploit suffered large scaled attack. As of this, ISPs and IXPs have decided to block NTP traffic to the abused servers. If servers are not patched correctly, only 5.000 servers are needed to generate a massive scale Distributed Denial of Services (DDoS) attack with around 400 Gbps [12]. The methods used for DDoS attacks are not because of exploits of NTP protocols or NTP daemon, but mainly because of poor configuration, and not using proper firewall configuration. The NTP servers are main target for attackers because today as stated, every personal computer, laptop or cellphone have built NTP client. Time synchronization on systems is the biggest factor to further vulnerabilities that can compromise the server or the client [12]. There were cases were NTP was used to manipulate logs and changing time on computer systems for changing the events [12]. The following can be used as vulnerability to attack NTP protocol [13]: Client/Server, Symmetric Modes, Rate Management Provisions, Digest layer of the Message’s Symmetric Key Cryptography, Autokey Protocol and Modification of NTP Header. Because NTP operates with UDP, thus does not need protocol handshake nor record for this request, primary exploit are DDoS attacks. Upgrading to the latest 4.2.7 version of NTP daemon is highly recommended [13, 14]. However, routers and firewalls are key factor for securing and optimizing the NTP servers, as will be discussed later.

3. SNTP Servers

There are two time NTP servers, primary and secondary. A primary timeserver or Stratum 1 server receives and UTC time signal directly from authoritative clock source such as atomic clock or GPS signal source. A secondary timeserver or Stratum 2 server receives time signal from one or more upstream NTP servers, while distributing time signals to other servers or clients. The chain of Stratum servers continues on. Stratum 3 server receives time signal from Stratum 2. Stratum n server can peer with many stratum n-1 servers, to maintain reference clock signal [11]. Stratum hierarchy can be seen in Fig. 2.

19 19 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

Fig. 2. Detailed hierarchy of the Stratum servers

4. NTPMK – Requirements and Setup In this section the overall requirements and setup will be presented. 4.1 Hardware reqiurements The hardware requirements are considered for the VMware Hypervisor, because every hardware component must be compliant with the software. Main server hardware used is described in the following Table 1:

TABLE I: Main hardware components for the NTP Server. Price Component Manufacture Complaint with VMware ESXi Motherboard ASUS M5A78L LE AMD 760G-780L / SB710 YES 50 EUR CPU AMD FX-8350 Eight-Core Processor YES 60 EUR RAM 16 GB 1600 MHz YES 60 EUR HDD/SDD 1 x 1TB Samsung HDD Pro 1 x 500 WD Blue YES 100 EUR Power Supply Cooler Master WAT Lite 700W YES 50 EUR Network cards 2 x Intel NIC Pro 1000 YES 50 EUR UPS Smart ONLINE 3 KVa AVR and OVR YES 150 EUR Router/Firewall Mikrotik RB2011 YES 100 EUR L2 Switch Cisco SG300 YES 180 EUR / / TOTAL: 1420 EUR

20 20 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

4.2 VMWARE ESXI 6.7 U2 Hypervisor configuration

VMware ESXi 6.7 U2 Hypervisor was used for this project. The ESXi is bare metal operating system, designed specifically for handling virtual machines with high throughput and load. Core features of ESXi includes [15]: Memory over-commitment and de-duplication, scalability up to 64 logical processing cores, 256 virtual CPUs and 1 TB of RAM per host, memory scalability, shaping of network traffic, using VLANs, NIC Card teaming, command line esxcli. In Fig. 3, a brief core design of VMware ESXi can be seen, and on Fig. 4 an actual summary tab of VMware ESXi is shown.

Fig. 3. VMware ESXi core structure

Fig. 4. An actual screen of VMware ESXi 6.7 U2 used in this project

For better security there are two network cards on the ESXi infrastructure server. NIC0 is for the production network as shown in Fig. 5, where all VM servers have access to the main router, and NIC1 just for management purposes, as can be seen in Fig. 6.

21 21 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

Fig. 5. An actual screen of production network in ESXi server

Fig. 6. An actual screen of management network in ESXi server

4.3 Fedora Server 30 and backup server installation and configuration

Installation of Fedora operating system was done through porting the image to USB stick and then installing. Acquiring the IP address and DNS setting are through router’s DHCP server. In the router there are statically assigned leases in DHCP server, and also static ARP leases as well. Having in mind security issues mentioned above, after the installation, the system must be checked for newest updates and security patches with the following commands in terminal:

sudo dnf upgrade –refresh sudo dnf update

For security reason, only SSH connection will be permit in the server, whereas standard port number TCP/22 will be changed to port number 2882. Cockpit [16], which is GUI management center, will be left by default, as integrated part of Fedora system, working on TCP port 9090. IPtables are also used as local firewall rules for the server.

Installation of the newest NTP daemon from repository was done. Configuration of NTP server is done through ntp.conf file in /etc directory, viewed by standard vi or nano editor in Linux. Below is the complete built configuration for the NTP server.

# Record the frequency of the system clock. driftfile /var/lib/ntp/drift # Permit time synchronization with our time source, but do not # permit the source to query or modify the service on this system. restrict default nomodify notrap nopeer noepeer noquery # Permit association with pool servers. restrict source nomodify notrap noepeer noquery

22 22 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

# Permit all access over the loopback interface. This could # be tightened as well, but to do so would effect some of # the administrative functions. restrict 127.0.0.1 restrict ::1 # Hosts on local network are less restricted. restrict 10.10.10.0 mask 255.255.255.0 nomodify notrap restrict -4 default kod noquery nomodify notrap nopeer limited restrict -6 default kod noquery nomodify notrap nopeer limited # Use public servers from the pool.ntp.org project. server ntp.bsdbg.net iburst server ptbtime1.ptb.de iburst server zeit.fu-berlin.de iburst server ntp0.fau.de iburst # Reduce the maximum number of servers used from the pool. tos maxclock 5 # Enable public key cryptography. #crypto includefile /etc/ntp/crypto/pw # Key file containing the keys and key identifiers used when operating # with symmetric key cryptography. keys /etc/ntp/keys

As the NTPMK server is Stratum 2, four Stratum 1 servers are chosen for time synchronization. The servers are chosen to be with geolocation closer to the NTPMK, with best possible latency and reliability. For this purpose we the servers are located in Bulgaria and in Germany. Then, the NTP daemon automatically chooses with whom to sync, according to the best possible time offset, as can be seen in Fig. 7 by issuing ntpq -p command in the terminal, presented.

Fig. 7. Screen of ntpq -p command issued in NTPMK server

This means that users will always have best service at any time. In order to provide reliable operation, backup of critical files is crucial. Backup of all configurations to an external local server is done with cron job in Fedora server.

23 23 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

With this command, backup of all-important folders is done in case of disaster. Cron job is configuration as described below:

0 4 * * 0 rsync -avzh --progress --delete -e ssh /etc /home /root /var/www [email protected]:/backup-ntp >/dev/null 2>&1 4.4 Network and firewall setup For router and firewall device, a Mikrotik RB2011 device is used, as can be seen in Fig. 8.

Fig. 8. An actual picture of Mikrotik RB2011 used as main router and firewall for NTP server.

The Mikrotik RB2011 uses RouterOS, an operating system developed by Mikrotik [17]. The RouterOS can be managed through terminal and via GUI application Winbox [18]. On the Fig. 9, a detailed diagram of the network setup is shown, where Mikrotik is placed in front of the VMware ESXi hypervisor and behind the main gateway. Separation of production and management network is also shown.

24 24 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

Fig. 9. A detailed network diagram with all elements

Mikrotik Internet configuration is done with static local IP parameters, assigned from the main gateway. For DNS settings an OpenDNS servers are used for better security, developed and maintained by Cisco Division [19]. As can be seen from Fig. 9, separate L3/L2 bridge domain is configured for NTPMK server for separating production from management, meaning separate DHCP server with static assigned leases. Source NAT (SRCNAT) is used for outbound and destination NAT (DSTNAT) for inbound connections are used. Destination NAT is also used for port forwarding [20]. Fig. 10 shows the NAT settings for NTPMK server, whereas Fig. 11 shows the forward chain of firewall settings.

25 25 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

Fig. 10. NAT settings in Mikrotik router for NTPMK

Fig. 11. Forward firewall chain settings in Mikrotik router for NTPMK

Regarding the security, firewall rules are specifically planned and designed for overall protection of the whole network. Some rules are logged for debugging and analytics reasons. Connection tracking is been modified for better utilization of router’s CPU and DDoS mitigation. Moreover, timing of the UPD timeout and UDP stream timeout is set to minimum, in order the UDP seasons to break down faster. This is very useful for offloading the router’s CPU and for mitigation of DDoS attacks.

4.5 Configuration of Advanced DNS Settings for ntpmk.net Hostname

Hostname for NTPMK sever is configured for using ntpmk.net domain as NTP server and http/https web site for information. Public IP address of the server is 78.157.31.21 and the domain is registered with Name Cheap [21], a hostname and DNS provider. For binding the domain name with the IP address, two “A” records are used in the DNS settings and CHAME record is used for positive SSL certificate of the web site, as can be seen on Fig. 12. With this, clients can use the domain name - ntpmk.net or the IP address in theirs NTP clients for time synchronization.

Fig. 12. Advanced DNS settings for the ntpmk.net domain, used for NTP synchronization and http/https web site for information

26 26 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

4.6 Testing and Registration

Testing and comparison is performed against most popular NTP server, integrated in all Windows operating systems time.windows.com. Trace route and connection lag is been used against time.windows.com in comparison to ntpmk.net. Fig. 13 presents how traffic is going from source IP in Republic of North Macedonia to the destination NTP server of Microsoft, and Fig. 14 shows the traffic flow towards the NTPMK server. Open Visual Trace Route application is being used for this purpose [22] and standard ntpdate -q command in Linux systems [23]. Testing time.windows.com NTP server with Open Visual Trace Route application:

Fig. 13. Testing trace route against time.windows.com NTP server

Testing ntpmk.net NTP server with Open Visual Trace Route application:

Fig. 14. Testing trace route against ntpmk.net NTP server

The following is result obtained from testing time.windows.com NTP server with ntpdate -q command: [root@serv2 ~]# ntpdate -q time.windows.com server 51.140.127.197, stratum 2, offset -0.005221, delay 0.08374 22 May 22:17:20 ntpdate[12998]: adjust time server 51.140.127.197 offset -0.005221 sec

The following is result obtained from testing ntpmk.net NTP server with ntpdate -q command: [root@serv2 ~]# ntpdate -q ntpmk.net server 78.157.31.21, stratum 2, offset 0.000653, delay 0.02681 22 May 22:18:55 ntpdate[13015]: adjust time server 78.157.31.21 offset 0.000653 sec

27 27 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

From the tests performed, it can be concluded that ntpmk.net provides much better timing offset against time.windows.com NTP server. Ntpmk.net NTP server provides unmatched results (see Fig. 14), because of the geolocation of the server. The route to NTP server time.windows.com goes over 46 hops and connection is being made over the Atlantic (see Fig. 13), whereas connection to ntpmk.net goes just over two hops terminating at the state. Registration was made at Network Time Foundation which is providing NTP public support services and hosting IETF NTP Working group [24]. As mentioned, NTPMK is Stratum 2 server and it is successfully being registered as secondary time server. With this, NTPMK become first official fully functional public NTP Stratum 2 server for Republic of North Macedonia.

5. CONCLUSION

The purpose of this research paper and practical project is creating a first free public NTP server for time synchronization in Republic of North Macedonia and rest of Europe. Before the realization of our project, all clients and applications were forced to use some other external NTP server, outside the state, thus having very high connection lag and low quality of service. After presenting basics on NTP version protocols, their vulnerabilities and SNTP servers’ structure, we presented our setup and requirements for this project. The server is made upon latest VMware ESXi hypervisor for greater scalability and reliability. The NTP server is successfully deployed in production and is registered in the official NTP register. With this, the clients using NTP services devices or application in Republic of North Macedonia and rest of Europe will have better services (timing offset), whereas ISPs’ main routing peering uplinks will be significantly offloaded. On the other hand, security issue will be on lower level. As future work, the NTMK server can be put in the official public pool of NTP servers, which would further benefit the cause, serving a larger number of clients. This means that anyone using this pool for time synchronization will get time synchronization from randomly chosen time servers using Round Robin algorithm dedicated for client’s geolocation region. Many and IoT makers are using pool.ntp.org as their default NTP server. The problem is solved with our NTPMK server. Now, a first publicly available server is serving our region, free for all to use, contributing to clients and ISP providers, providing them with cleaner traffic to handle.

28 28 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

References

1. IETF, RFC-5905 (2010) Network Time Protocol Version 4: Protocol and Algorithms Specification. Available at https://tools.ietf.org/html/rfc5905 (last accessed in May 2, 2019) 2. NTP History, (2019) History of NTP. Available at http://www.ntp.org/ntpfaq/NTP-s- def-hist.htm (last accessed on May 4, 2019) 3. IETF, RFC-778 (1981) DCNET Internet Clock Service. Available at https://tools.ietf.org/html/rfc778 (last accessed on May 3, 2019) 4. IETF, RFC-958 (1985) Network Time Protocol (NTP). Available at https://tools.ietf.org/html/rfc958 (last accessed on May 2, 2019) 5. IETF, RFC-1059 (1988) Network Time Protocol (Version 1) Specification and Implementation. Available at https://tools.ietf.org/html/rfc1059 (last accessed on May 2, 2019) 6. IETF, RFC-2405 (1998) The ESP DES-CBC Cipher Algorithm with Explicit IV. Available at https://tools.ietf.org/html/rfc2405 (last accessed on May 2, 2019) 7. IETF, RFC-1305 (1992) Network Time Protocol (Version 3) Specification, Implementation and Analysis. Available at https://www.ietf.org/rfc/rfc1305.txt (last accessed on July 27, 2018) 8. IETF, RFC-2030 (1996) Simple Network Time Protocol (SNTP) Version 4 for IPv4, IPv6 and OSI. Available at https://tools.ietf.org/html/rfc2030 (last accessed on May 1, 2019) 9. IANA, (2019) IPv4 Multicast Address Space Registry. Available at https://www.iana.org/assignments/multicast-addresses/multicast-addresses.xhtml (last accessed on May 8, 2019)

10. Cisco Press, Internet Protocol Journal, Internet Protocol Journal, Volume 15, No. 4 (2012) Protocol Basics: The Network Time Protocol. Available at https://www.cisco.com/c/en/us/about/press/internet-protocol-journal/back- issues/table-contents-58/154-ntp.html (last accessed on July 1, 2017) 11. Time and Date, (2012) International Atomic Time. Available at https://www.timeanddate.com/time/international-atomic-time.html (last accessed on October 22, 2017) 12. Insights, (2017) Best Practices for NTP Services. Available at https://insights.sei.cmu.edu/sei_blog/2017/04/best-practices-for-ntp-services.html (last accessed on May 14, 2019) 13. EECIS UDEL, (2019) NTP Security Analysis. Available at https://www.eecis.udel.edu/~mills/security.html (last accessed on May 3, 2019) 14. Infosecinstitute, (2014) Network Time Protocols, NTP Threads and Countermeasures. Available at https://resources.infosecinstitute.com/network-time-protocol-ntp-threats- countermeasures/#gref (last accessed on June 21, 2017) 15. Techtarget-Search Virtualization (2011) VMware ESXi Features. Available at https://searchservervirtualization.techtarget.com/tip/Understanding-VMware-ESXi- features (last accessed on July 29, 2019) 16. Cockpit Project, (2013) Fedora Cockpit. Available at https://cockpit-project.org/ (last accessed on May 8, 2019) 17. Mikrotik Software, (2014) RouterOS. Available at https://mikrotik.com/software (last accessed on April 9, 2019) 18. Mikrotik Winbox, (2014) Winbox. Available at https://mikrotik.com/download (last accessed on May 16, 2019)

29 29 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

19. Cisco OpenDNS, (2015) OpenDNS. Available at https://www.opendns.com/ (last accessed on May 21, 2019) 20. Mikrotik NAT, (2010) Manual:IP/Firewall/NAT. Available at https://wiki.mikrotik.com/wiki/Manual:IP/Firewall/NAT (last accessed on February 21, 2019) 21. Name Cheap Hosting, (2000) Name Cheap. Available at https://www.namecheap.com/ (last accessed on May 22, 2019) 22. Visual Trace Route, (2016) Open Visual Trace Route. Available at https://www.visualtraceroute.net/ (last accessed on November 22, 2018) 23. IBM Knowledge Center, (2012) Ntpdate Command. Available at https://www.ibm.com/support/knowledgecenter/en/ssw_aix_71/com.ibm.aix.cmds4/n tpdate.htm (last accessed on May 22, 2019) 24. NTP Network Time Foundation, (2007) Network Time Foundation’s NTP. Available at http://support.ntp.org/bin/view/Main/WebHome (last accessed on May 20, 2019)

30 30 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

Risk analysis and risk management in IT industry

Aleksandar Kovachev and Vesna Dimitrova

University “Ss. Cyril and Methodius”, Faculty of Computer Science & Engineering Skopje, Macedonia [email protected], [email protected]

Abstract. The more our business relies on IT, the more important it is to identify and control the risks that could affect the IT systems. Threats ranging from equip- ment failure to malicious attacks by hackers have the potential to disrupt critical business systems or open up access to any confidential data. This paper explains what is IT risk and the different types of IT risk that can affect us. It outlines how to carry out an IT risk assessment, and how to develop an effective IT risk man- agement process. It also explains the IT security management standards and pol- icies, and the interconnected role of IT risks and business continuity, because there are many standards for IT security as ISO 27000 and many others as ISO 12207 for system engineering that include some risk management issues or some standards for risk management like ISO 31000. Finally, it gives us a simple IT risk management checklist to help us assess and address any technology risks in the business.

Keywords: Risk Management × Treat × Risk Types × Risk Categories × IT Indus- try × Incident Management × Risk Standards.

1 Introduction

Our economy, society and individual lives have been transformed by digital technologies. They have enabled improvements in science, logistics, finance, communications and a whole range of other essential activities. As a consequence of this, we have come to depend on digital technologies, and this leads to very high expectations of how reliable these technologies will be. There are many standards for IT security as ISO 27000 and many others as ISO 12207 for system engineering that include some risk management issues or some standards for risk management like ISO 31000that can be followed [6] [7] [5]. Every organization has to make difficult decisions around how much time and money to spend protecting their technology and services; one of the main goals of risk management is to inform and improve these decisions. People have had to deal with dangers throughout history, but it’s only relatively recently that they’ve been able do so in a way that systematically anticipates and aspires to control risk. This paper is based on research and survey method of an existing and proven works and it can be found useful to: ─ Identify the common risks associated with a small business; ─ Identify the external and internal factors which affect risk for a small business;

31 31 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

─ Identify situations that may cause risk for a small business; ─ Identify the common warning signs of risk for a small business; ─ Implement, monitor, and evaluate a risk management plan for a small business. Risk management is the process that allows IT managers to balance the operational and economic costs of protective measures and achieve gains in mission capability by protecting the IT systems and data that support their organizations’ missions. This pro- cess is not unique to the IT environment; indeed, it pervades decision-making in all areas of our daily lives [11]. There are many works developed related to this field of research. Some of them are emphasizes the management of the risks, some of them are analyzing the nature of the risks, and some of them are developing a risk management tools for bigger project suc- cess. In [12] authors present the revised risk framework for analyzing IT outsourcing and the implications of these findings in practice. In [1] authors are indicating new directions for research in the relation between risk management and project success and they are presenting the key elements, which are, stakeholder perception of risk and suc- cess and stakeholder behavior in the risk management process. In [8] authors are dis- cussing approaches to risk analysis appropriate for IT and they are presenting some suggested tools for risk analysis and management. In the content below, in Section 2 will be described the different IT risk categories and types, in Section 3 will be discussed about the impact of IT risk management failure to business. Further, in in Section 4 will be discussed about the meaning of IT risk management, what are the steps of IT risk management, what are the risk elements and controls. It will be discussed about the IT risk assessment methodology and the types of the same and in the Section 5 will be presented the IT risk management checklist for more effective management of the IT risks in the business. And at the end of the paper there will follow a conclusion and list of references and used literature.

2 Risk in IT Industry

Information technology or IT risk is any threat to the business data, critical systems and business processes. It is the risk associated with the use, ownership, operation, involve- ment, influence and adoption of IT within an organization. IT risks have the potential to damage business value and often come from poor man- agement of processes and events.

2.1 Categories of IT Risks

IT risk spans a range of business-critical areas, such as: ─ security – ex. compromised business data due to unauthorized access or use; ─ availability – ex. inability to access the IT systems needed for business opera- tions; ─ performance – ex. reduced productivity due to slow or delayed access to IT sys- tems;

32 32 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

─ compliance – ex. failure to follow laws and regulations (ex. data protection). IT risks vary in range and nature. It's important to be aware of all the different types of IT risk potentially affecting our business.

2.2 Different Types of IT Risk

The IT systems and the information that the company hold on them face a wide range of risks. If the business relies on technology for key operations and activities, we need to be aware of the range and nature of those threats. Threats to the IT systems can be external, internal, deliberate and unintentional. Most IT risks affect one or more of the following: ─ Business or project goals – ex. Failure to deliver the project on time; ─ Service continuity – ex. Inability to continue with further development of new functionalities; ─ Bottom line results – ex. Business not advancing because of continuous bad per- formance; ─ Business reputation – ex. Bad reputation gained by bad management decisions regarding the project risks; ─ Security – ex. Security breach in IT sector; ─ Infrastructure – ex. Failure of company technical infrastructure.

Looking at the nature of risks, it is possible to differentiate between: ─ Physical threats - resulting from physical access or damage to IT resources such as the servers. These could include theft, damage from fire or flood, or unauthor- ized access to confidential data by an employee or outsider; ─ Electronic threats - aiming to compromise the business information – ex. a hacker could get access to the website, the IT system could become infected by a computer virus, or we could fall victim to a fraudulent email or website. These are commonly of a criminal nature; ─ Technical failures - such as software bugs, a computer crash or the complete failure of a computer component. A technical failure can be catastrophic if, for example, we cannot retrieve data on a failed hard drive and no backup copy is available; ─ Infrastructure failures - such as the loss of the internet connection can interrupt the business; ─ Human error - is a major threat - ex someone might accidentally delete important data, or fail to follow security procedures properly.

3 Potential Impact of IT Failure in Business

For businesses that rely on technology, events or incidents that compromise IT can cause many problems. For example, a security breach can lead to:

33 33 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

─ identity fraud and theft; ─ financial fraud or theft; ─ damage to reputation; ─ damage to brand; ─ damage to the business physical assets.

Failure of IT systems due to downtime or outages can result in other damaging and diverse consequences, such as: ─ lost sales and customers; ─ reduced staff or business productivity; ─ reduced customer loyalty and satisfaction; ─ damaged relationship with partners and suppliers.

If IT failure affects our ability to comply with laws and regulations, then it could also lead to: ─ breach of legal duties; ─ breach of client confidentiality; ─ penalties, fines and litigation; ─ reputational damage.

If technology is enabling the connection to customers, suppliers, partners and busi- ness information, managing IT risks in the business should always be a core concern. We can assess and measure IT risks in different ways. To find out how to carry out an IT risk assessment and learn more about IT risk management process [4].

4 IT Risk Management

In business, IT risk management entails a process of identifying, monitoring and managing potential information security or technology risks with the goal of mitigating or minimizing their negative impact. Examples of potential IT risks include security breaches, data loss or theft, cyber- attacks, system failures and natural disasters.

4.1 Why Risk Management Matters

Risk management exists to help us to create plans for the future in a deliberate, respon- sible and ethical manner. This requires risk managers to explore what could go right or wrong in an organization, a project or a service, and recognizing that we can never fully know the future as we try to improve our prospects. Risk management is often perceived as a technocratic and dull profession. Risk man- agement is about analyzing our options and their future consequences, and presenting that information in an understandable, usable form to improve decision making [10].

34 34 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

4.2 Steps in the IT Risk Management Processes To manage IT risks effectively, we need to follow these six steps in the risk manage- ment process:

• Identify risks - determine the nature of risks and how they relate to the business. To take a look at the different types of IT risk; • Assess risks - determine how serious each risk is to the business and prioritize them. To carry out an IT risk assessment; • Mitigate risks - put in place preventive measures to reduce the likelihood of the risk occurring and limit its impact. To find solutions in our IT risk management checklist; • Develop incident response - set out plans for managing a problem and recovering the operations. Form the IT incident response and recovery strategy; • Develop contingency plans - ensure that the business can continue to run after an incident or a crisis; • Review processes and procedures - continue to assess threats and manage new risks.

4.3 Elements of Risk Once we have identified our scope, most component-driven approaches require the risk analyst to assess three elements of risk. These three elements are typically described by the terms impact, vulnerability and threat. Risk is the intersection of impact, threats, and vulnerabilities. In order for a risk to be realized, a force (threat) overcomes re- sistance (vulnerability) causing impact (bad things). We are familiar with measuring forces and resistances (resistance is a force in the opposite direction) which is why we see another abused formula: Risk = Likelihood * Impact. Because threat and vulnera- bility are both a force and may be easily combined into this new “likelihood” [9]. Impact. Impact is the consequences of a risk being realized. When conducting com- ponent-driven risk assessments, impact is usually described in terms of the conse- quences of a given asset being compromised in a given way. This impact is described in different ways, but one of the more common techniques is to assess the impacts to confidentiality, integrity and availability, of information. For example, an impact might be described in terms of a loss of confidentiality of a customer dataset, or the corruption (loss of integrity) of the company's accounts. A loss of one of these properties can be connected to other types of consequence, such as lost money, loss of life, delays to projects, or any other kind of undesirable outcome. Vulnerability. A vulnerability is a weakness in a component that would enable an impact to be realized, either deliberately, or by accident. For instance, a vulnerability could be a piece of software which allows a user to illegitimately increase their user account privileges, or a weakness in a business process (e.g. not properly checking the identity of someone before issuing them credentials for access to an online system or service). Regardless of the type of vulnerability in question, it is something which can be used to cause an impact.

35 35 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

Treat. Threat is the individual, group or circumstance which causes a given impact to occur. For example, this could be: ─ a lone hacker, or a state-sponsored group; ─ a member of staff who has made an honest mistake; ─ a situation beyond the control of the organization (such as high-impact weather).

The purpose of assessing threat is to improve the assessment of how likely a given risk is to be realized. Typically, when assessing a human threat actor, analysts consider people who are likely to want to harm the organization. This then allows them to con- sider both the capability and intent of these groups. Risk experts use threat taxonomies and categorization methods to provide a common language of threat capability. One of the weaknesses of threat assessments is that a threat's capability can change rapidly, and reliable information on these changes can be hard to come by. Because of this, don't treat the assessments of threat capability as if they are static, and make sure to recognize uncertainty and variability in the threat assessments.

4.4 IT Risk Controls

As part of the risk management, we need to try to reduce the likelihood of risks affecting the business in the first place. And to put in place measures to protect our systems and data from all known threats. Regarding this, we should: ─ Review the information we hold and share. Make sure that we comply with data protection legislation and think about what needs to be on public or shared sys- tems. Where possible, remove sensitive information; ─ Install and maintain security controls, such as firewalls, anti-virus software and processes that help prevent intrusion. See how to protect our business online; ─ Implement security policies and procedures such as internet and email usage pol- icies, and train staff. Follow best practice in cyber security for business; ─ Use a third-party IT provider if it lacks in-house skills. Often, they can provide its own security expertise. See how to choose an IT supplier for our business.

If we can't remove or reduce risks to an acceptable level, we may be able to take action to lessen the impact of potential incidents. We should consider: ─ setting procedures for detecting problems (ex. a virus infecting the system); ─ getting insurance against the costs of security breaches.

4.5 IT Risk Assessment Methodology

When assessing the IT risks in the business, we should avoid spending too much time and money reducing risks that may pose little or no threat to the business. Instead, we should focus on the most serious risks, based on:

36 36 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

─ The likelihood of the risk happening - the Probability of occurrence of an impact that affects the environment; ─ The cost or impact if it does happen - the Environmental impact if an event oc- curs.

This is known as the quantitative risk assessment.

Risk = Consequence x Likelihood (1)

The C × L matrix method therefore combines the scores from the qualitative or semi- quantitative ratings of consequence (levels of impact) and the likelihood (levels of prob- ability) that a specific consequence will occur (not just any consequence) to generate a risk score and risk rating. Essentially, the higher the probability of a "worse" effect occurring, the greater the level of risk. This C x L risk assessment process involves selecting the most appropriate combination of consequence and likelihood levels that fit the situation for a particular objective based upon the information available and the collective knowledge of the group (including stakeholders, academics, managers, in- dustry, researchers and technical staff) involved in the assessment process [9]. When making decisions about what are appropriate combinations of consequence and likelihood, if more than one combination of consequence and likelihood is consid- ered plausible, the combination with the highest risk score should be chosen (i.e. this is consistent with taking a precautionary approach) [2].

4.6 Types of IT Risk Assessment Methodologies

Generally speaking, there are two main types of risk assessment: ─ qualitative risk assessment; ─ quantitative risk assessment.

A quantitative assessment combines the probability of risk occurring and the costs of impact and recovery. For example, if we assess a risk to have a high probability of happening and a potential of high cost/impact to the business, we will deem it high risk. Those that are unlikely to happen or will cost little or nothing to remedy, will fall under the category of low risk. Quantitative measures of risk like this are only meaningful when we have good data. We may not always have the necessary historical data to work out probability and cost estimates on IT-related risks, since they can change very quickly. A more practical approach may be to use a qualitative assessment. This relies on using our judgement to decide if the probability of occurrence is high, medium or low. For example, we might classify as 'high probability' something that we expect to happen several times a year. We can do the same for cost/impact in whatever terms seem useful, for example: ─ low - would lose up to half an hour of production;

37 37 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

─ medium - would cause complete shutdown for at least three days; ─ high - would cause irrevocable loss to the business.

We might then decide to rank risks according to the significance to the business. Once we establish what the risks are and how important they are to the business, we will be able to decide whether to accept the risks or manage them. Sometimes, it may be best to use a mixed approach to IT risk assessments, combin- ing elements of both quantitative and qualitative analysis. We can use the quantitative data to assess the value of assets and loss expectancy, but also involve people in our business to gain their expert insight. This may take time and effort, but it can also result in a greater understanding of the risks and better data than each method would provide alone.

4.7 Common Examples of Risk Management Regardless of a company's business model, industry or level of earnings, business risks must be identified as a strategic aspect of business planning. Once risks are identified, companies take the appropriate steps to manage them to protect their business assets. The most common types of risk management techniques include avoidance, mitigation, transfer, and acceptance. [3]

Avoidance of Risk. The easiest way for a business to manage its identified risk is to avoid it altogether. In its most common form, avoidance takes place when a business refuses to engage in activities known or perceived to carry a risk of any kind. For in- stance, a business could forgo purchasing a building for a new retail location, as the risk of the venue not generating enough revenue to cover the cost of the building is high.

Risk Mitigation. Businesses can also choose to manage risk through mitigation or re- duction. Mitigating business risk is meant to lessen any negative consequence or impact of specific, known risks, and is most often used when those risks are unavoidable. For example, software companies mitigate the risk of a new program not functioning cor- rectly by releasing the product in stages. The risk of capital waste can be reduced through this type of strategy, but a degree of risk remains.

Transfer of Risk. In some instances, businesses choose to transfer risk away from the organization. Risk transfer typically takes place by paying a premium to an insurance company in exchange for protection against substantial financial loss. For example, project insurance can be used to protect a company from the costs incurred when a project or other client related part is not performing well.

38 38 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

Risk Acceptance. Risk management can also be implemented through the acceptance of risk. Companies retain a certain level of risk brought on by specific projects or ex- pansion if the anticipated profit generated from the activity is far greater than its poten- tial risk. For example, software companies often utilize risk retention or acceptance when developing a new product. The cost of research and development does not out- weigh the potential for revenue generated from the sale of the new product, so the risk is deemed acceptable.

5 IT Risk Management Checklist

Risk management can be relatively simple if we follow some basic principles. To manage the IT risks to the business effectively, we do the following: ─ Think about IT security from the start when plan or update the IT system. Discuss the needs and potential problems with the system users; ─ Actively look for IT risks that could affect the business. Identify their likelihood, costs and impact. Carry out a comprehensive IT risk assessment; ─ Think about the opportunity, capability and motivation behind potential attacks. Understand the reasons for cyber-attack; ─ Assess the seriousness of each IT risk and focus on those that are most significant. ─ Understand the relevant laws, legislations and industry guidelines, especially if we have to comply with the General Data Protection Regulation (GDPR); ─ Configure the PCs, servers, firewalls and other technical elements of the system. Keep software and hardware equipment up to date. Put in place other common cyber security measures and read about securing the wireless network; ─ Do not rely on just one technical control (ex. a password). Use two-factor authen- tication to guarantee user identity – ex. something we have (such as an ID card) and something we know (a PIN or password); ─ Develop data recovery and backup processes and consider daily back-ups to offsite locations; ─ Support technical controls with appropriate policies, procedures and training. Un- derstand the most common insider threats in cyber security; ─ Make sure that we have a business continuity plan. This should cover any serious IT risk that we cannot fully control. Regularly review and update our plan; ─ Establish an effective IT incident response and recovery measures, as well as a recording and management system. Simulate incidents to test and improve our incident planning, response and recovery; ─ Develop and follow specific IT policies and procedures, such as on email and internet use, and make sure our staff know what falls under acceptable use; ─ Consider certification to the IT security management standards for the business and trading partners.

39 39 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

In any type of project planning, risk management is a necessary tool. Risk manage- ment identifies and prioritizes risks, measures how harmful they can be, and develops a plan to deal with risks that are a threat to the project. As we develop our risk management plan, including the risks and how they will be dealt with, a risk checklist should quickly tell us from past experience and forecasting if a risk area will evolve.

6 Conclusion

In most organizations and companies, the network itself will continually be expanded and updated, its components changed, and its software applications replaced or updated with newer versions. In addition, personnel changes will occur, and security policies are likely to change over time. These changes mean that new risks will surface and risks previously mitigated may again become a concern. Thus, the risk management process is ongoing and evolving. This paper analyzes the risks in the IT industry, how they are applied in today’s IT companies and emphasizes the risk management and need for an ongoing risk evalua- tion and assessment and the factors that will lead to a successful risk management pro- gram, also making a path for possible future practical researches in this field. In conclusion that the risk management is crucial for running a successful business and there should be a specific schedule for assessing and mitigating mission risks within the company, but the periodically performed process should also be flexible enough to allow changes where warranted, such as major changes to the IT system and processing environment due to changes resulting from policies and new technologies.

References

1. De Bakker, K., Boonstra, A., Wortmann, H.: Does risk management contribute to IT project success? A meta-analysis of empirical evidence. International Journal of Project Manage- ment 28(5), 493–503 (2010) 2. Food and Agriculture Organization of the United Nations: Qualitative Risk Analysis, http://www.fao.org/fishery/eaf-net/eaftool/eaf_tool_4/en 3. Horton, M.: Common Examples of Risk Management, Trading Skills & Essentials (2018) 4. Invest Northern Ireland: the official online channel for business advice and guidance in Northern Ireland, https://www.nibusinessinfo.co.uk/content/what-it-risk 5. ISO 31000 Risk management, https://www.iso.org/iso-31000-risk-management.html 6. ISO/IEC 27001 Information security management, https://www.iso.org/isoiec-27001-infor- mation-security.html 7. ISO/IEC/IEEE 12207:2017 Systems and software engineering — Software life cycle pro- cesses, https://www.iso.org/standard/63712.html 8. McGaughey Jr, R.E., Snyder, C.A., Carr, H.H.: Implementing information technology for competitive advantage: risk management issues. Information & Management 26(5), 273– 280 (1994)

40 40 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

9. Policy-oriented marine Environmental Research for the Southern European Seas (Perseus),: United Nation supported organization, http://www.perseus-net.eu/site/content.php?ar- tid=2204 10. Risk management guidance from the National Cyber Security Center in UK, https://www.ncsc.gov.uk/collection/risk-management-collection?curPage=/collection/risk- management-collection/essential-topics/introduction-risk-management-cyber-security- guidance 11. Stoneburner, G., Goguen, A., Feringa, A.: Risk management guide for information technol- ogy systems. Special Publication pp. 800–300 (2002) 12. Willcocks, L.P., Lacity, M.C., Kern, T., Risk mitigation in IT outsourcing strategy revisited: longitudinal case research at LISA. The Journal of Strategic Information Systems 8(3), 285- 314 (1999)

41 41 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

A Review-based Survey of Mobile Student Services

Sasho Gramatikov1, Aleksandar Spasov1, Ana Gjorgjevikj1, and Dimitar Trajanov1

Faculty of Computer Science and Engineering, Ss Cyril and Methodius University in Skopje, North Macedonia [email protected], [email protected], [email protected], [email protected]

Abstract. Mobile services tend to become ubiquitous. Following the trend of mobile applications becoming part of everyday life in all areas, there is a need for putting the mobile services in action for education purposes. This paper intent is to give an overview of the mobile student services that are currently being implemented and used by a list ofac- knowledged and proved world rank universities. Apart from the overview of their features, the paper analyses the users’ feedback in form of rat- ings, reviews and downloads from the mobile market applications and uses this data to measure the influence of their features on the over- all users’ satisfaction and highlight the directions of the future mobile services.

Keywords: Mobile student services · University mobile applications · Feedback analysis.

1 Introduction

Smartphones have managed to get their place in every branch of the peoples’ life – establishing phone calls, Internet browsing, instant messaging, business, working with different applications while on the go. Due to the increasing num- ber of features, processing power, storage capacity and battery endurance, their number has been rapidly increasing from day to day. Thus, the number of mobile internet users in 2018 is about 3.3 billions and it has a serious tendency to be- come even bigger in the upcoming years [17]. A considerate part of these users are young people. In 2018, 94% of the young people in US aged be-tween 18 and 29 owned and used a smartphone. 28% of them used their smartphones at home with no broadband connectivity [3]. Young population has become dependent on smartphones, and the major reasons for this are the online services and the easy Wi-Fi Internet access they get to provide these services [19]. This convenience results with over 77% Americans going online on daily basis [2]. Apart from the vast use of the smartphones in the everyday life activities, they also take part in the educational process. More than 44% of the young adults are accessing

42 42 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

educational materials [4] which proves that mobile devices can be a great tool for improving the process of education [6]. Although the smartphones are widely accepted by the students, there are still obstacles that slow down the process of their deeper penetration in the educational process, such as distraction of students during classes [15], univer- sity regulations that prohibit their use in classrooms, need for regular battery charge, lack of faculty member cooperation and limited charging points, to name a few [8]. However, as the universities address these problems by suppressing the technical obstacles, accepting the importance of the smartphones and providing proper professional training to the educators, the smartphones are quickly be- coming incredible classroom resource [11]. The students are not alone intheuse of the smartphones. Significant part of the academic staff replaced their laptops by the smartphones, which they use not only for personal purposes, but also for sharing knowledge with the students and supervising their projects and student activities [7]. There are many examples where mobile phones are part of the educational process both in secondary and higher schools and prove to affect the process in a positive manner [13]. Apart from the educational process, smartphones impact their social life, physical activities and psychological well-being [15]. Universities around the world are using this advantage and have created mobile applications and services that can be used by their students to access the learning materials, to be up-to-date with the events and to collaborate in between [5, 22, 14]. Another crucial importance of these services is that they help the students, especially the international students [21] and those who move to new cities, to integrate into the new environment and be-come part of communities sharing similar ideas and interests [19]. By means of the smartphones, the students and the academic staff also become part of virtual communities which enables them be part of the everyday student life even if they are not present on the campus premises [16]. A mobile student service is an application based on mobile computing and context- aware application concepts that can provide more user-centric information and services to the students [24, 9]. Because of the numerous advantages of the mobile applications versus the web applications and the preference of the students to use native application ver- sus using the existing web applications on their smartphones [10], most of the universities that follow the technological advances move towards utilization of mobile applications as part of their educational process. “Because of the charac- teristics of mobility, mobile learning has become the main direction for the future development of higher education, and also drawn attention in the information construction of a new stage in universities” [18]. The advantage of using mobile devices in education is that they provide the students easy and quick access to educational materials and answers to their questions in a conformable environment, and at the same time, they still give them the opportunity to be in a social interaction with the other students or teachers [1, 16]. Therefore, many of the world recognized universitieshave invested in new mobile applications and use them to assist in the education

43 43 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

process. Other universities use their existing information systems and create new mobile applications to access the existing services in a more convenient way [23].

The main purpose of the universities when selecting and designing the fea- tures of the mobile applications is that they meet the needs and requirements of the students from educational and social. According to the analysis conducted by the authors of [12], the most preferred characteristics of the mobile applica- tions is their availability and ease of use, while the entertaining features are at the least expected among students. Although the analysis considers thestudents satisfaction, it is based on a questionnaire among a geographically local group of students. In [20], a continental scale of analysis of features typical for the mobile application in the Australian universities is conducted, however, there is no in- formation about the student’s satisfaction. The goal of this research is to make a global feature-wise overview and comparison of the existing world universities mobile applications, taking into consideration the users’ satisfaction, aiming to find a correlation between certain features availability and users’ satisfaction. To our knowledge, no similar work that analyses both the features and the users’ satisfaction has been published to present date. In our work, we used the QS list as a starting point for choosing the world universities. Then, for eachuniversity, a search for mobile application was made in the Google Play Store and in the iTunes App Market. Afterwards, each representative application found for the university was installed and its features were analyzed. We conducted the same analysis in two different time periods: in the year 2016 and 2018. Our search on the application market resulted with a total of approximately 50 downloaded and installed mobile student applications. It is important to mention that one half of these universities are from the USA. The reason for such dominance of universities from USA is that this country occupies most of the top 100 positions in the university world ranking. The rest of the universities considered in this work are from other parts of the world, mostly from Europe, which gives a wider perspective of the use of the mobile applications in the world. For every installed application we collected the ratings and the re-views of the users and used them to evaluate their popularity and to pin point their advantages and disadvantages based on users descriptive opinion.

The focus of this paper is an overview of the services offered by the student mobile applications, their contribution to the overall rating of the application, as well as analysis of the users’ quality of experience when using theseservices.

The rest of the article is organized as follows: in Section 2 we give short overview of the different mobile application platforms considered in our work and the features of a set of downloaded mobile applications of some world top universities. Then, in Section 3, we analyze the results obtained from the users’ feedback in form of ratings and reviews. After the discussion in Section 4, we eventually give conclusion from our work.

44 44 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

2 Overview of mobile applications platforms and services

The mobile student services used by the universities of our list are developed for different leading platforms on the market. Most of them, 47 out of 50, are built to support both Android and iOS mobile platforms. The rest of the universities support only iOS or Android mobile platform. The third place of the world known mobile platforms is taken by Blackberry, which is supported by only 7 Universities. The fourth place is taken by the Windows Phone mobile platform, supported by only 3 universities. According to the basic services implemented in these applications, they can be divided in two major groups: open-access services, accessible by all users and limited-access services permitted to authenticated users only.

2.1 Open-access services The open-access services allow all users to view and access basic information about the university, the educational process and diverse university services with- out the need for previous authentication. Table 1 lists the open-access services that we identified and their frequency of appearance in the mobile applications.

Table 1: Overview of open-access services Name Description Freq. News and events Preview of news published and events related to 45 the university Location Guides the students and the visitors within the 41 campus Course info List of courses and programs of the university 40 Library Access and preview of the books of the Univer- 33 sity’s library Public transport Eases the access to the university campus with 25 public transport Office hours Office hours of the lecturers 24 Emergency contacts Emergency phone numbers 22 Sport Information about sport activities 16 Communica- tion Communication between two or more users 16 Food List of restaurants within the campus and menus 14 Multimedia Sharing of multimedia content among the stu- 14 dents Admission Information about the students that have applied 10 or want to apply for study program Photo Gallery of photos provided by the university 9 Broadcast Real-time broadcast of the student radio and TV 5 Career Overview of active jobs, internships, scholarshisp 5 and volunteering announcements

45 45 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

2.2 Limited access services The limited access services are features of the mobile applications available for authenticated users only. Compared to the open-access services, these services provide more information and details about the university and the authenticated user. According to our survey, these services are not very common in the mobile applications, which can be also seen in Table 2.

Table 2: Overview of limited access services Name Description Freq. Contact Information for the students registered in the uni- 16 versity database Grades Grades for enrolled courses 15 Schedule Schedule and location of classes for enrolled 14 courses News and events News and events related to the user 13 Exam Time and location of exams 13 Library Book reservation 8 Career Career opportunities based on the user’s curricu- 3 lum Assigned task Eases the access to the university campus with 2 public transport

3 User feedback analysis

One of the major objectives of our survey on the mobile student services is to evaluate the users’ experience and measure their satisfaction of using these services. In order to achieve this goal, we collected data from the official mobile applications markets and their discussion forums, as well as from the official web sites of the universities. The data of interest for our work was the rating of the users, the number of reviews and their content, the approximate number of downloads, the number of features they contain and the world ranking of the universities. Following the trend of mobile application, a considerate part of the world universities have already implemented a mobile solution for accessing their in- formation. According to the services covered, the most advanced application for unauthenticated users is available for the students of the Harvard University. This application covered 14 out of total number of 17 services. From the users’ perspective, this application has set the pattern of how a student application should look since its design and action workflow managed to reduce the user actions to get to the proper place and to see the desired information. In the category of applications for authenticated users, the Ashford University developed the most advanced mobile application. It covers 7 out of 9 services

46 46 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

present in its category. Even though the two missing services are very handy for finding a contact from a colleague or a proper job, the Ashford University has implemented successfully the modules related to the enrolled courses and information and notification that are very basic and much needed for the students as target group that uses this application. The applications of the Oxford and the Harvard Universities are the ones with the most user-friendly features. The Oxford University application has an advanced method for finding the optimal route to reach the university using Google Maps and the user’s location. The Harvard University’s application pro- vides the easiest way to get and search through different sections. It allows more than one search criteria to be added as part of a search in every section. This feature allows consistency and easy accessibility in the different modules. On the scale from 1 to 5, the range of ratings in 2016 is from 2.1 to 4.6, and in 2018 from 2.6 to 4.7. From the information gathered from the application stores, the most downloaded application has 100.000 – 500.000 downloads and the least downloaded application has 1.000-5.000 downloads.

3.1 User ratings analysis

Our first goal was to find the influence of a single feature of the application on the overall application rating. Since the rating data we gathered referstotheuser satisfaction of the entire set of services implemented in the applications, we mea- sured the average rating of the applications that contain certain feature based on four different criteria. For that purpose, we defined four different measures of average rating of the applications depending on the presence of different services correlated to parameters like total number of implemented services, number of downloads and reviews. Let us assume that there are n mobile applications in the data set A = {A1,...,An}. Each application from the data set can have up to m services belonging to the service set S = {S1,...,Sm}. The presence of the service Sj in the application Ai is marked with the variable aij having value 1 if Ai implements the service or value 0, otherwise. Each application has di downloads, it is rated with grade gi and has ri reviews from the users. Based on this definition, the first measure for the influence of certain service Sf on the application’s rating is the average rating of all the applications that implement that service expressed as: n i=1 aijgi Rf = ! n (1) =1 aif !i The second measure for the influence of service Sf on the applications’ popular- ity is the average rating of all applications that implement the service weighted according to their total number of services. The weight of the rating gi of appli- cation i that implements service f,i.e.,aif = 1, is calculated as the normalized m number of different services i=1 aif aij that the application implements. Thus, the rating of the applications! with more services will have higher influence on the value of the average rating of the applications that implement the same service.

47 47 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

The value of this average rating is expressed as:

n m i=1 j=1 aif aijgi Rf = ! n ! m (2) =1 =1 aif aij " !i !j For calculation of the third measure, we assume that each service j of the applica- tion i contributes with equal portion to its overall rating. Therefore, the weight of the rating of application i that implements the service f in the average rating is −1 m calculated as the normalized portion of that service =1 aif compared to #!j $ the total number of services. With such definition, given service of application with less services has higher influence on the rating of that application, com- pared to application with more services because its presence contributes more to the overall application’s rating. The expression for the average rating of the applications that implement service f is:

−1 n m =1 =1 aij aif gi !i #!j $ Rf = −1 (3) n m =1 =1 aij aif !i #!j $ The fourth measure of influence gives more credit to the applications that are not only more downloaded, but also stimulated the users that downloaded them to give feedback. The weight of the rating of each application i that implements service f is proportional to the normalized ratio of the number of reviews and the number of downloads ri/di. Hence, the average rating of the applications with service f is given as:

n ri i=1 di aijgi Rf = ! n ri (4) =1 aif % !i di Using these definitions and the information we obtained for the reviewed appli- cations of the set A, we calculated the four different measures for each possible feature of the set S. Then, we calculated the average value of these results and used it to sort the influence of each feature on the rating of the application in descending order. The results of these calculations for the two different time periods are shown in Fig. 1. The series rate, norm by features, norm by fea- ture presence and norm by reviews/downloads refer to the defined measures Rf , Rf , Rf and , Rf accordingly. The last series, named as average is calculated as" the average% of the four series. Fig. 1a shows that in 2016, the applications with the highest average user ratings implement the Pictures, Food, TV/Radio and Sports service. These services vary only slightly in the values of the average ratings and can be observed as a group of services that have significantly higher influence in the user satisfaction compared to the rest of the services. The later services have even influence that linearly decreases within the entire range, ex- cept the last two services, the Admission and the Office hours serviceswhichdo not follow the linear decreasing fashion and contribute less to the satisfaction of

48 48 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

the users. One of the main reasons of the low popularity of the Office hoursser- vices is that, according to the users’ comments, the data is sometimes outdated or not present at all.

(a)

(b)

Fig. 1: Overview of average contributions of features to applications popularity in the year a) 2016 and b) 2018

Fig. 1b shows that, in 2018, the applications with the highest average user ratings implement the Sport, Multimedia, Pictures, and TV/Radio service. Com- pared to the year 2016, the Dining and food service is replaced by the Sport service in the top popular services. The distinguishing popularity of the services related to multimedia data is expected since we are living in a modern age where the social platforms have deeply penetrated in the everyday life of the general population, especially the young people, and the exchange of multimedia data has become a common practice. Another considerable result from the compar- ison is that the communication services in the year 2018 are among the least popular services, which can be explained with the fact that the current mes-

49 49 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

(a)

‘ (b)

Fig. 2: Weighted rating of mobile applications for ascending order of world uni- versity ranking in the year a) 2016 and b) 2018

saging and social mobile applications are so dominant and well established [15], that the communication service of the universities’ application havelittleodds of becoming convenient tool for communication among the students. In Fig. 2, we show the ratings of the analyzed mobile applications in two different periods, organized in descending order based on their world ranking. As a reference, we used the QS world ranking list, which annually evaluates the universities based on their academic and employer reputation, number of citations, number of students and number of international students. The world best university has rank 1, and each following university in the figure is assigned value incremented by 1, although the rank difference is not necessarily 1. The figure shows only the order of these universities sorted by their rank. Some of the considered universities are not ranked in the QS world ranking, andthere- fore, we assume they have lower rank and add them as last items in the figure. Additionally, each university on the figure is represented with a circle with size proportional to the number of downloaded applications. Fig. 2a shows that in the year 2016, on macro level, the ratings of the applications follow a linear de- pendence on the university ranking. Their rating decreases as the world rank of

50 50 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

the university decreases. However, if we cluster the universities in regions based on the regularity of the ratings relative to the decreasing trend-line, we can see that the first region of the quarter top rated universities (with real ranks from 1 to 16) consists of applications with relatively high user rating placed above the decreasing line, following its direction. It is also important to note that the top three world universities, not only have one of the best rated applications, but also offer their service to responsive users for longest period. Therefore, these ratings can be considered as one of the most objectively calculated values.The second region, formed by the middle range of considered universities, consists of applications with highest deviation of ratings relative to the trend-line. Com- pared to the other ratings, this region contains the worst rated applications, but also one of the best rated. The last region also contains regular values of rat- ings placed near the trend-line. Even though the region consists of applications from lowest ranked universities, as well as from not ranked universities, the ap- plications have in average higher rate than the better-ranked universities from the second region. The figure also shows that many of the ratings are based on higher number of reviews. Contrary to the previous results, Fig. 2b shows that in the year 2018, on macro level, the ratings of the applications do not depend on the university ranking, which can be seen by the almost parallel to the x-axis trend-line. If we cluster the universities in regions based on the regularity of the ratings relative to the trend-line, we can see that there are three regions, where the applica- tions from the first and the third regions are above or near the trend-line, while the second region (in the middle) consists of applications with high deviation of ratings relative to the trend-line. This observation leads to the conclusion that the students are equally demanding and critically oriented to the mobile applications, no matter what quality of education the university offers. In the analysis of mobile applications in 2018, fewer mobile applications were considered since some of the applications considered inf 2016 were not on the market, or they were on the market, but the latest ratings were outdated.

3.2 User reviews analysis

The feedback of the users given in their comments is another important part of the survey. The comments are a descriptive measure of the user ratings and the quality of their experience, emphasizing the pros and cons of the features offered in the applications. For that purpose, we collected the comments and reviews of the analyzed applications written in English in two different periods, before the year 2016 and before the year 2018, and used their content to create word clouds. In our analysis, we preprocessed the users’ feedback using standardproce- dures in the natural language processing. First, we converted all words into low- ercase. Then, we performed tokenization using the NLTK (Natural Languages Tool-Kit) which separates the words and the punctuation into string tokens. In the next pre-processing phase, we removed the so-called stop-words. We used the

51 51 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

NLTK English stop-words list, which currently contains 153 stop-words, how- ever, we deliberately removed the words “no” and “not” from this list since they are essential for evaluation of the connotation of the comments. In the last phase, we performed stemming and lemmatization of the obtained token array which group and reduce inflected forms of a word, so they can be analyzed as single word. The stemming process uses heuristic that crops the ends of the words, while the lemmatization process uses a vocabulary and morphological analysis of the words to return their base form. We used the WordNet based lemmatizer from NLTK to perform the last operation. After measuring the frequency of appearance of each single word in the pre- processed comments, we created word-clouds. We deliberately omitted the word “app” since it is multiple times more frequent than any other word and gives no significance to the users’ quality of experience expression. The single-word word cloud generated from student service mobile applications comments collected up to 2016 are shown in Fig. 3.a. The size of a single word in the word cloud is proportional to its frequency of appearance in the pre-processed token array. According to the figure, the users have positive general experience since the most frequent word is “great”. There are also other positive words like “love”, “like” and “good”. However, many users share negative experience which can be con- cluded by the frequent presence of the explicit negations “not”, “no”, “can’t”. There are also implicit negative words which express malfunction or lack of up-to date information of the latest application version, such as “fix”, “update” and “need” or report a critical problem in a recent update. The word cloud from the comments collected between year 2016 and 2018 in Fig. 3.b demonstrates that students have much worse experience with the mobile applications intended for them compared to the earlier period. The most frequent words in the cloud are “work”, “not” and “crash” which, if combined, imply negative experience regarding the stability of the applications. Apart from the students experiencing problems and malfunctional features, there are satisfied users which considerate the mobile applications to be great and useful for obtaining information about the classes. Although the word clouds if Fig. 3 show the general connotation of the users’ experience, the overall opinion of the users cannot be precisely depicted since some of the words may appear in both positive and negative connotation. A typical example is the word “work” which students can use to complain that the application is not working properly or emphasize that the application works flawlessly. Other such frequent verbs are “open”, “access”, “show” and “read”. Therefore, we added another step in the preprocessing of the comments which pairs the consecutive words in so-called bigrams and measured their frequency of appearance. Based on the frequency, we created the bigram word clouds shown in Fig. 4. Fig. 4.a demonstrates that in the first period of analysis, the students consider that the university mobile application is great, they love it or just think that it is good and easy to use. Nevertheless, there are students which are not satisfied with the application offered by their university, and therefore, require that the issues are fixed, or new extra features are added. On the contrary, in

52 52 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

(a) (b)

Fig. 3: Word cloud of single words for student mobile applications reviews in the year a) 2016 and b) 2018

Fig. 4.b, we can see that in the second period of analysis, many the students are not satisfied with the applications because some of them do not work, do not open, crash or do not accepts users’ credentials. Yet, there is portion of the comments that imply positive user experience with the application offered by their university. Based on the single word and bigram word clouds we can conclude that there is a trend of bigger criticism of the newer versions of the student applications. This fact coincides with the results obtained from the analysis of the users rat- ing based on the world universities ranking. Since the mobile applications have become an inevitable part of students’ everyday routines, their requirements for more reliable and feature-rich applications have grown. As the mobile devices become more complex and equipped with more diverse and sophisticated sen- sors, the expectations for more complex applications are a natural consequence. Moreover, the mobile application market offers so many free, feature-rich appli- cations for navigation, content sharing, text and voice communications, that the university mobile applications must maintain high quality of service to keep their users satisfied. The universities should pay special attention to the services that are unique for the educational process such as course information, office hours, course materials and campus navigation. Based on the analysis of the reviews of all university applications that were found on the Market, we summarize that the users from all universities appreciate the most the detailed information pro- vided by the mobile applications. They like the good design of the applications, their responsiveness and the overall user experience. In most of the applications the users are satisfied with the accurate course information and schedules. They especially appreciate the applications that embed the useful information, instead of links that redirect the user to external apps or browsers. Other features that users find very useful are the maps and the routes that navigate them around the university campuses. The users of the student mobile applications, in most

53 53 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

(a) (b)

Fig. 4: Word cloud of bigrams for student mobile applications reviews in the year a) 2016 and b) 2018

of the cases, are irritated when the application crashes, feels buggy and unstable or when the provided information is inaccurate and not up to date. The reduced performance like the long waiting times for retrieval of the latest data from the Internet and the slow response due to the delay between switching contexts and modules in the applications are other reasons that make users unsatisfied. The users also dislike when the applications do not keep the user’s credentials, and hence, require frequent re-logging when accessed after certain period.

4 Desirable behavior of the mobile applications

Judging according to the contents of their reviews, the users of the analyzed mobile application would appreciate quick response with accurate information. They would like to receive notifications when important information, like sched- ules, grades or new materials are added or changed. Another feature required by the users is accessing the university mail from the mobile application that would provide attaching files from their mobile devices. Users also find very use- ful the trend of opening everything as part of the application, without using the default browser for accessing external links. As an overall overview of the applications, the users require consistent access provided by a stabile application that can run smoothly with fast switching between screens and contexts. The most important requirement is low memory utilization and uninterrupted access to all functionalities, especially the basic ones like search, maps, navigation and local public transport. Another important requirement by the users is the func- tionality where authenticated users can communicate and collaborate with other students and teachers.

54 54 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

5 Conclusions

This paper presents a brief overview of the mobile student services and their basic functionalities. Therefore, a set of 50 world class universities were consid- ered and their mobile applications were examined. The applications were cate- gorized in to major groups based on whether they require authentication or not. The services of these applications were further categorized according their basic functionalities and analyzed. To quantitatively measure the influence of these features ratings, number of downloads and comments data was collected from the mobile application markets for the most common mobile operating systems for two different periods: before 2016 and before 2018. We defined four measures of the users’ satisfaction and used them to analyze the influence of the features of the applications on their overall rating. The results show that the features related to non-educational activities like sports, multimedia and dining enjoy highest popularity with a slight change of the rank in the different timeperiods. We also show that the overall popularity of the applications used to be related to the university ranking in the past years, while in the recent years, the univer- sity ranking has nothing to do with its mobile application attractiveness. The analysis of the user comments additionally shows that the students used to share more positive experience with the services offered by the applications, however, after their latest updates, students are considerably unsatisfied with their sta- bility and outdated information. They also criticize certain feature and require new updates where the issues are fixed. Nevertheless, there are still universities that offer great applications with features that make the life of the students eas- ier, especially those who are at the beginning of their studies. The results from this paper can serve as guideline for the university applications developers that will give the essential features for a state-of-the-art student mobile application, emphasizing the importance and influence of each feature and functionality for gaining satisfied users. This task is becoming a bigger challenge taking into con- sideration the fact that the smart mobile devices are becoming more powerful, and at the same time, new free applications are emerging on the market that of- fer so many features that become a main environment for information retrieval, communication and navigation.

References

1. Concordia online education, “the u.s. digtal consumer report”, https://www.nielsen.com/us/en/insights/reports/2014/the-us-digital-consumer- report.html 2. Pew research center, “about a quarter of u.s. adults say they are ‘almost con-stantly’ online”, http://www.pewresearch.org/fact-tank/2018/03/14/about-a- quarter-of-americans-report-going-online-almost-constantly 3. Pew research center, “mobile fact sheet”, http://www.pewinternet.org/fact- sheet/mobile/ 4. U.s. smartphone use in 2015, https://www.pewinternet.org/2015/04/01/us- smartphone-use-in-2015/

55 55 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

5. Aakhus, M.A., Katz, J.E.: Perpetual contact: Mobile communication, private talk, public performance. Cambridge University Press (2002) 6. Aldrich, A.W.: Universities and libraries move to the mobile web. Educause Quar- terly 33(2), 5 (2010) 7. Alfawareh, H.M., Jusoh, S.: The use and effects of smartphones in higher education. International Journal of Interactive Mobile Technologies (iJIM) 11(6), 103–111 (2017) 8. Alwraikat, M.A.: Smartphones as a new paradigm in higher education overcoming obstacles. International Journal of Interactive Mobile Technologies (iJIM) 11(4), 114–135 (2017) 9. Asif, M., Krogstie, J.: Mobile student information system. Campus-Wide Informa- tion Systems 28(1), 5–15 (2011) 10. Bowen, K., Pistilli, M.D.: Student preferences for mobile app usage. Research Bulletin)(Louisville, CO: EDUCAUSE Center for Applied Research, forthcoming), available from http://www. educause. edu/ecar (2012) 11. Clayton, K., Murphy, A.: Smartphone apps in education: Students create videos to teach smartphone use as tool for learning. Journal of Media Literacy Education 8(2), 99–109 (2016) 12. Delialioglu, O.,¨ Alioon, Y.: Student preferences for m-learning application char- acteristics. International Association for Development of the Information Society (2014) 13. Ekanayake, S.Y., Wishart, J.: Mobile phone images and video in science teaching and learning. Learning, Media and Technology 39(2), 229–249 (2014) 14. Gikas, J., Grant, M.M.: Mobile computing devices in higher education: Student perspectives on learning with cellphones, smartphones & social media. The Internet and Higher Education 19,18–26(2013) 15. Jesse, G.R.: Smartphone and app usage among college students: Using smart- phones effectively for social and educational needs. In: Proceedings of the EDSIG Conference. p. n3424 (2015) 16. Kretovics, M.: Commuter students, online services, and online communities. New Directions for Student Services 2015(150), 69–78 (2015) 17. Mayuran Sivakumaran, P.I.: The mobile economy 2018. Tech. rep., Groupe Speciale Mobile Association (2018) 18. Miao, G.: The construction and development of university mobile learning system. In: 2013 6th International Conference on Information Management,Innovation Management and Industrial Engineering. vol. 2, pp. 588–590. IEEE (2013) 19. Mohd Suki, N.: Students’ dependence on smart phones: The influence of so- cial needs, social influences and convenience. Campus-Wide Information Systems 30(2), 124–134 (2013) 20. Pechenkina, E.: Developing a typology of mobile apps in higher education: A na- tional case-study. Australasian Journal of Educational Technology 33(4) (2017) 21. Roberts, P., Dunworth, K.: Staff and student perceptions of support services for international students in higher education: A case study. Journal of Higher Edu- cation Policy and Management 34(5), 517–528 (2012) 22. Rowley, J., Banwell, L., Childs, S., Gannon-Leary, P., Lonsdale, R., Urquhart, C., Armstrong, C.: User behaviour in relation to electronic information services within the uk higher education academic community. Journal of Educational Media 27(3), 107–122 (2002) 23. Shilpi Taneja, A.G.: A mobile app architecture for student information system. International Journal of Web Applications 7(2), 56–63 (2015)

56 56 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

24. Wilson, S., McCarthy, G.: The mobile university: from the library to the campus. Reference Services Review 38(2), 214–232 (2010)

57 57 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

Rating and Geo-Mapping of Travel Agencies that Provide Vacations to Long-Distance Destinations from Skopje, North Macedonia

Dina Gitova, Elena M. Jovanovska, Andreja Naumoski

Ss. Cyril and Methodius University in Skopje, Faculty of Computer Science and Engineering, Skopje, North Macedonia [email protected], [email protected], [email protected]

Abstract. Specialized niche services like providing offers to long-distance des- tinations, can be considered a determining strategy for travel agencies to ac- quire market share, differentiate themselves from competitors and to add higher value to the customer. The goal of this paper is to identify, rate and map the travel agencies in Skopje, North Macedonia that offer trips to long-distance des- tinations, to provide basic information about them and to derive a ranking sys- tem that would proportionally represent them accordingly to the user ratings and number of offers. Travel agencies that fulfill a pre-determined set of criteria are weighted according to user rating and the number of users that left a rating. Furthermore, to include travel agencies without user ratings, who provided the service, a ranking from the number of offers available was presented. Moreo- ver, the creation of a geographical database will provide the customers with a sorted list of useful information with regards to the expected service, visually presenting the aggregated data. The visual-spatial elaboration of the evaluation is represented with the software ArcGIS. Implementing the results of the re- search in other map applications, with a special focus on mobile technologies, can account for the transparent ranking of the travel agencies.

Keywords: Travel agencies ∙ Long-Distance Vacations ∙ ArcGIS.

1 Introduction

Travel agencies are continually repositioning the provided services to fit in the chang- ing tourism industry. The internet as a platform for sharing experiences has shifted the demand from mass-market oriented travel arrangements that contain standard packag- es, to individually customized niche services. The business model of the travel agen- cies must be adapted for the current demand for offers that include distinguished trav- el packages to long-distance destinations. Moreover, marketing strategies are influ- enced by the internet as a distribution channel and e-intermediary for interaction with the target customers and a wider audience. Furthermore, an updated web page serves the purpose of a new virtual office.

58 58 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

Having understood the taste of the global traveler who is interested in unique travel experience, the goal of this paper is to represent the travel agencies in Skopje, North Macedonia that provide vacations to unusual destinations proportionally to the num- ber of offers as well as the user rating left on Google.com. According to the data from the State Statistical Office of North Macedonia, 51,99% of the population in the capital city Skopje has been on a vacation in 2016 [1]. To find the best vacation package, the travelers have at their disposal 190 travel agencies in the city of Skopje – however many of them offer arrangements only in the neighboring countries. Since the goal of this paper is to find out how many travel agencies in Skopje, provide specialized niche offers to long-travel destinations, the e- channel used for selection of the agencies, was the travel agency’s web-site, con- sistent with many studies that use website evaluation frameworks and criteria [2, 3]. Additionally, the customers use the website for travel planning and to seek reassur- ance from review sites. Moreover, a qualitative analysis from the available infor- mation was implemented i.e. a selection process framework of pre-determined criteria was developed to ensure the accuracy of the data found and to create a ranking sys- tem. This data can be used for travel pattern discovery and market potential analysis, like [4]. Furthermore, recommended systems can be built upon this data to further improve the analysis [5, 6]. The model was expanded using Geographic Information System (GIS) that satis- fies the need for making, handling and picturing geodata. With the aim of integrating tourism related information and adding a spatial component in interpreting the find- ings, as well as including different possibilities for implementation of the research, one of the most popular GIS software was used, ArcGIS by Esri. Additionally, geo- spatial visualization via this software was used in identifying trends that were propor- tionally represented. The research results will enable travelers to quickly and easily find travel agencies that offer trips to long-distance destinations together with all the relevant information regarding location and contact data such as e-mail, phone and the website. Enlisting and ranking of the travel agencies that offer long-distance trips will ease the search of the perfect vacation in a new destination. Moreover, the results of the research can be further developed to take part in mobile apps that are using maps to better represent various places in a city. Having introduced the research background, the remaining sections will provide a more detailed path of the created database with respect to the analyzed factors. After- ward, the connection between the database and the GIS Software will be elaborated, explaining the procedure in detail. Subsequently, the weighed modelling approach as a theoretical backbone of the model, together with the interpolation and the symbolo- gy of the modelling in the GIS software will account for the results of the study. In conclusion, the findings of the study together with the future possibilities of develop- ment and use of the research will be presented.

59 59 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

2 Method and materials

2.1 Strategic Selection Framework

Stage 1: Data Preprocessing

1. Identification of all the travel agencies in Skopje 2. Analysis of data accuracy through website information 3. Selecting the agencies that provide travel arrangements to long-distance destina- tions

Stage 2: Data storage and management

1. Filling in the table with the required attributes that allow for the identification of travel agencies and the number of different offers to the previously described des- tinations 2. Weighing the data according to the user ratings

Stage 3: Data analysis with GIS 1. Interpolation and symbology of the data according to: 2. Number of different offers 3. User ratings

2.2 Data Management

The source of the data used in this paper is the website Zlatna Kniga [7]. In Skopje, according to this website, there are 190 travel agencies. The credibility of the provid- ed data is verified through finding the website of each travel agency, and once con- firmed, is followed by an analysis of the provided arrangements to long-distance des- tinations. In this phase, the travel agencies that do not have a website or do not have valid in- formation about the offered packages, are discarded, which led to 37 travel agencies that offer trips to long-distance destinations. For each travel agency, the following attributes are entered in table:

Name – the name of the travel agency. Address – address of the travel agency. Address comment – additional comment about the location of the travel agency. Phone - the travel agency's phone number. Working hours – weekday and weekend work hours. Destinations – all the long-distance destinations that the travel agency is offering. Email - the travel agency's email address. Web page – the travel agency’s website. Longitude - degrees of longitude in decimals. Latitude – degrees of latitude in decimals. Rating Score – rating grade from online users. Users – the number of users that rated the travel agency.

60 60 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

Weighed – the calculated weight of the review score (weighed with 0.3) and number of users that rate the travel agency (weighed with 0.7).

2.3 Connecting the Map with the Data in ArcGIS

To create the visual representation of the travel agencies, the ArcMap tool, part of the ArcGIS software package, was used. For this project, as a base map, we used the geographical map of Skopje which was imported by selecting the option Imagery with labels. On the map of Skopje with the municipality labels, a layer of street names was added from ArcGIS Online, another product by Esri, to showcase the exact locations of the travel agencies. Regarding which information is currently needed, these layers can be turned on or off. We export the database from the table into the geodatabase and the map. The locations of the 37 agencies are displayed on the map using the longitude and latitude information (see Fig. 1). For an accurate representation of the locations on the map, the coordinate system WGS_1984_Web_Mercator_Auxiliary_Sphere is changed to WGS_1984.

Fig. 1. Representation of the travel agencies on the map, including labels with their names.

61 61 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

Coordinates are taken from the website Map Coordinates [8] for the locations available on Google Maps, and the coordinates for travel agencies that are not listed on Google Maps have been manually entered from the same website. To ensure high- er visibility, the symbols can be changed to any desired icon with the option Symbol- ogy. The chosen icon in this layer where we display the locations of all travel agen- cies is an airplane, as a representation of the travel agencies. Additionally, the labels are added via the Labelling toolbar and the Maplex Label Engine. To change the style, size, font and placement of the labels, we use Label Manager. Because there are 37 agencies, we can see the overlapping of the multiple labels. To ensure higher visibility, the Scale Range can be set to not show the labels when zoomed out from the map with a scale higher than 1:12,500. More information needs to be provided to the users besides just the locations of the agencies and the labels. To avoid cluttering the visualization more, a hover function is added via Display expressions, which shows the name of the agency and all destina- tions that it offers, as it is shown with Fig. 1.

2.4 Weight Modeling Approach

The weight modelling approach was used to provide a rating that will balance the different number of users derived by weighing the user rating score with 0,3 and the number of users that gave that rating with a weight of 0,7. Consequently, the average rating of a higher number of users will outweigh the rating of the smaller sample. The formula used is:

� � �����_����� = � + � (1) max � 1 max � 2

where n, m represents the review score and the number of user reviews respectively and w1 and w2 represent the weights. However, for the travel agencies who don’t have a lot of user ratings, there is another interpolation (see below Interpolation and symbology) that will showcase the travel agencies appropriately to the number of offers presented on the website. For the interpolation in this research, we used two criteria: the number of destina- tions offered by travel agencies, and the total score from user reviews. Firstly, travel agencies that offer a higher number of choices are rated with the highest visibility to benefit the customer who values diversity. Moreover, this will enable the representa- tion of the travel agencies that have no user rating. Secondly, another interpolation was performed based on the factors: user rating (on a scale from 0 to 5, 0 being no rating, and 5 being the highest rating) and several users that rated the travel agencies. The data is weighed to enable evening out of the difference in sample size i.e. ensur- ing comparable data. Furthermore, this interpolation shows the customers which are the best places to visit according to user reviews. To ensure the representation of the results is within the borders of Skopje, a shapefile of North Macedonia is added to the project [9] and from the attribute’s table, all the municipalities except Skopje are erased. This creates a shapefile with only the borders of Skopje. The result is shown in Fig. 2.

62 62 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

Fig. 2. Shapefile representing the borders of Skopje as well the travel agencies

This shapefile is set up as a coordinate system with the new data and is exported as a layer on the map. The interpolation analysis is done using the Interpolation option in the Geoprocessing menu and the number of classes and selection of color are setup in the Layer Properties menu. In the new layer that was created for displaying the first interpolation with respect to the number of destinations a travel agency provides, the symbology can be changed from simple points to another classification type. The symbols have been presented according to the number of destinations offered by the travel agency, i.e. the higher the number of destinations, the bigger the symbol. This provides better visualization for the end user. The number of classes and the interval, as well as the symbol are random, in the case of interpolation settings. The color combination chosen for representation is Purple to Green Diverging, Bright. In this layer, the hover function, as well as labels, are added to enable better recognition. For the second interpolation, a rating of the travel agencies and the number of users who rated them according to Google is used. The user ratings are adjusted according to the number of users, as explained in the Weight Modeling Approach section. Interpola- tion is executed as stated in the above-mentioned procedure, within Skopje’s borders. For higher visibility, 7 classes and the colors from red to green are selected, starting from the lowest rating marked with red, to the highest rating marked with green.

3 GIS Modelling Results

The result of this project is a map with three layers. The first layer is a simple repre- sentation of the locations of all travel agencies included in the database, with the ap- propriate details. In the second layer, we can see the interpolation based on the diver- sity in destinations a travel agency provides. This is combined with the symbols that differ in size for increased visibility and easier recognizing of the agencies with the most choice for the user, shown with Fig. 3.

63 63 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

Fig. 3. Interpolation based on destinations offered (Top5 travel agencies are labeled)

Since almost all these agencies are found in the city Centre, the interpolation can be seen best when zoomed in, as shown in Fig. 4.

Fig. 4. Interpolation results based on the total number of long-distance destinations offered in ArcGIS

64 64 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

The third and final layer (Fig. 5) shows the second interpolation based on user re- views. Here, the best travel agencies that offer the best services are shown in green, whereas the worst travel agencies are shown in red. As stated in the previous para- graph, the results are seen well when the map is zoomed in the city center.

Fig. 5. Interpolation results in the city center based on user reviews using ArcGIS (R is the user satisfactory rating, while WR is weighted rating based on (1))

According to the interpolation results, Savana Travel Agency rated as number 1 travel agency with a weighted rating of 0.958, followed by the Atlas plus (WR = 0.543). While on the other hand, neither of these travel agencies offers more than 10 long-distance destinations, they offer the best service according to the users.

4 Conclusion

With this research paper an independent evaluation of website data from travel agen- cies in Skopje, North Macedonia was presented, with the focus on rating the agencies that offer arrangements to long-distance destinations. The accuracy of the available information has been verified, followed by a ranking system based on user ratings as well as the number of offers provided. A weighing model used here, for user satisfac- tory rating, allowed comparability of the data and more inside information reading the relationship in a geographical sense. By using the ArcGIS software, we can indicate which agencies in a geographical sense in the city of Skopje offer the best service

65 65 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

based on user reviews, and at the same time offering the highest number of long- distance vacations. Moreover, the above-mentioned travel agencies to the offered services were located on geo-mapping software. Explanation of the procedure in the software such as further filtering as per user needs allows for more personalized use, such as sub-selection according to the variety of choice of offered destination and user rating. Furthermore, all the data needed for contacting the agencies via the internet or in person is readily available. Finally, the research can be implemented in mobile applications that use maps or in algorithms for independent ranking. The customer will benefit from the readily available, independently rated data for future decision making when planning a trip.

Acknowledgment This work was partially financed by the Faculty of Computer Science and Engi- neering at the Ss. Cyril and Methodius University in Skopje.

References

1. Travels of the domestic population – statistical review: Transport, tourism and other ser- vices, 2016, State statistical office of Republic of Macedonia, http://www.stat.gov.mk/Publikacii/8.4.17.03.pdf, last access 2018/12/20. 2. Chiou, W. C., Lin, C. C., & Perng, C.: A strategic framework for website evaluation based on a review of the literature from 1995–2006. Information & Management, 47(5-6), 282- 290, (2010). 3. Stojchevska, M., Naumoski, A. and Mitreski, K.: Modelling the Impact of the Hotel Facili- ties on Online Hotel Review Score for the City of Skopje. In: 2018 2nd International Sym- posium on Multidisciplinary Studies and Innovative Technologies (ISMSIT), 1-5, IEEE, (2018). 4. Bazazo, I. K., & Khanfar, S. M. Using (GIS9. 3 Software) to Visualize Travel Patterns and Market Potentials of Aqaba City in Jordan, (2014). 5. Gao, M., Fu, Y., Chen, Y. and Jiang, F.: User-Weight Model for Item-based Recommenda- tion Systems. JSW, 7(9), 2133-2140, (2012). 6. Symeonidis, P., Nanopoulos, A. and Manolopoulos, Y.: Feature-weighted user model for recommender systems. In International Conference on User Modeling, 97-106, Springer, Berlin, Heidelberg, (2007). 7. Zlatina Kniga – Travel agencies in Skopje - https://zk.mk/search/?what=туристичка+агенција&filter_city=15457, last accessed 2018/11/11. 8. Map Coordinates - http://www.mapcoordinates.net, last accessed 2018/10/12. 9. Shapefile of Macedonia taken from http://download.geofabrik.de/europe/macedonia.html, last accessed 2018/10/12.

66 66 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

Using continuous integration principles for managing virtual laboratories

Dejan Stamenov, Dimitar Venov, Goran Petkovski, Boro Jakimovski, and Goran Velinov

Faculty of Computer Science and Engineering, Skopje, Republic of North Macedonia, Ss. Cyril and Methodius University [email protected], [email protected], {goran.petkovski,boro.jakimovski,goran.velinov}@finki.ukim.mk

Abstract. Researchers around the world are involved in the process of managing and processing large volumes of data while using a plethora of different tools. This increases the need for virtual laboratoriesthat will allow scientists to create an environment for rapid data processing and analyses, using the latest available technology withoutspendingtoo much time on configuration. In this paper, we present the Virtual Labo- ratory System (vLab) for managing laboratories on the cloud. The sys- tem is based on the service-oriented architecture, built on top of Spring Framework and hosted on the EGI Federated Cloud (FedCloud). It pro- vides an interface for managing clusters of virtual machines that are created by the service, providing tools for data processing and analysis, along with the required datasets for these operations. The architecture of the system and the tools used to achieve successful management of virtual laboratories are also presented with the details of the cloud ar- chitecture where the vLab system is hosted.

Keywords: FedCloud · Virtual laboratory · Virtual organization · Web service.

1 Introduction

The data scientists around the world come from established fields such as statis- tics, engineering, physics, machine learning, business intelligence, neuroscience and more, having special competences (skills and knowledge) that enables them to extract value from digital data. The scientific communities are involved in creating and maintaining data products, in which the end users contribute to- wards improving the product. Users are increasingly interacting with real-time data and services, thus generating lots of data. The science implies a holistic approach and uses scientific methods to extract value from data. This raises the need for virtual laboratories that will allow scientists to create an environment for rapid data processing and analyzes, using the latest available technology. In this paper we present our Virtual Laboratory System (vLab), a service- oriented architecture hosted on the EGI Federated Cloud (FedCloud) [3] that

67 67 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

provides a service for creating and managing virtual laboratories. OpenNeb- ula is used as a cloud computing platform on the data center in our solution, implementing the Open Cloud Computing Interface (OCCI) for managing and monitoring cloud resources. Our system features an Application Programming Interface (API), which provides an automation in the process of managing com- putational resources to the end users. Up until this point, it was required for a cloud administrator to manually set up the cluster of virtual machines, and then apply proper configuration before it can be used by researches. Withthis service, we provide the possibility for each data scientist to create clusters of virtual machines in the cloud without need for a dedicated cloud administrator. The clusters of virtual machines that are created by the service provide tools for data processing and analyses, along with the required datasets. The virtual machines in the cluster are automatically provisioned and configured using Pup- pet [7]. The Hadoop [1] open-source software framework is used in the virtual machines for distributed storage and processing of big data. The paper structure is as follows. Related work is presented in Section 2. The FedCloud structure is provided in Section 3, while the implementation of the vLab service is explained in detail in Section 4. The Section 5 describes the process of automating the deployment of Hadoop cluster using Puppet. Our use case application is provided in Section 6, while concluding the paper contribution and presenting the future work in Section 7.

2Relatedwork

Atlas Tier 3 Cluster Tool (at3c) consists of remote repository for checking cluster parameters and configuration of hardware environment based on real-world use scenarios. Main advantage is quick recovery from undesirable situations (failed hardware, shortage of sustainable environment, etc.) [4], as opposed to our sys- tem which provides automatic configuration only of Hadoop environments. Another collaborative tool for building common e-Infrastructures is CHAIN- REDS project. Their existing grid infrastructure contains materials from re- searches around the world, with mission to provide computational resources to scientists and organizations. Its integration with EGI (European Grid Infras- tructure) and FedCloud has managed to produce functional solution for storing cloud images and templates. Our service-oriented architecture shares similari- ties with this solution, having CHAIN-REDS with one advantage: creation and management of mini-clouds in different VRC (Virtual Research Communities) from all continents in the world [11], with difference to our system which give access only to EGI’s infrastructure.

3 FedCloud structure

EGI Federated Cloud represents heterogeneous research community with aim to provide distributed collection of computational resources. Integrates public and community clouds into a scalable computing platform for data and/or compute

68 68 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

driven applications and services. Each cloud can provide Infrastructure as a Ser- vice (IaaS) features for users, i.e. CPU/GPU computing, storage and network configuration. There are Virtual Organizations (VOs) formed within the EGI Federated Cloud. Each VO of the cloud federation is a resource allocation for a specific community or purpose. The cloud providers who support a given com- munity join the VO dedicated to that community. Users and their applications can consume cloud resources from the VO after becoming members of the same VO. X.509 certificates are used to control the access to the cloud resources.

4 vLab service

The vLab service is the main part of the cloud architecture which provides computational services to the end users. It provides services for creation of com- putational resources1, along with services to remove these resources and listings of available computational resources for each end user. Each computational re- source is created based on a configuration, which includes different specifications for the virtual machines that are being created (in terms of computational power, storage and memory), along with the number of virtual machines being created (these virtual machines form independent cluster) and required configuration files by which the virtual machines in the cluster will be configured to be aware of the existence of other machines in the cluster itself.

4.1 Technical overview

The API (Application Program Interface) is built with Java technology and Spring Framework [12], using the REST architecture (used over HTTP protocol). vLab system API does not directly communicate with the cloud; an intermediate framework jOCCI [5] is used, which provides the functionality of accessing differ- ent instances of the cloud when doing computational services on user’s request. In addition to this, PostgreSQL [6] is used as a main database of the API, storing virtual machine configurations, user’s tokens and the computational resources that are created for each token. The token represents unique set of characters for each user of the cloud, which is sent to the API when requesting a service by the end users. Because every virtual machine created by the API needs additional configuration to become part of a cluster, the Git repository management system - GitLab is used to publish configuration data. This repository is managed by the API itself; when new computational resources are created, configuration files are published in the repository. When resources are removed from the cloud, the configuration data for these resources is removed from the repository. Puppet is used as an orchestrational tool for automatic deployment and configuration of parameters in Hadoop cluster. Description is stored in Git repository,sowhen users apply the puppet template it’s automatically pulled from the repository

1 ”Computational resources” or ”virtual machines”, both terms have the same mean- ing further in the paper.

69 69 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

Repository manager

Apply configuration for each virtual machine in the cluster Gitlab

Publish virtual machine configuration Puppet

Request virtual laboratory vLab API Create cluster of virtual machines on the cloud Researchers Laboratory cluster of virtual machines

FedCloud

Fig. 1. vLab system - architectural overview. and provision into FedCloud. In this way, users from different background can create custom-related scenarios for using cluster management tool (in our paper is described Hadoop cluster) with management of related git repository. The FedCloud VMs are developed on Open Nebula Cloud infrastructure. On Fig. 1 is given the architectural overview.

4.2 API interface

The vLab system API provides multiple interface calls to meet the requirements for management of computational resources on the cloud. The most important API call is the one that creates clusters of virtual ma- chines on the cloud - POST: resource/createresource. This API call has additional required parameters:

– userToken - unique set of characters for each user of the cloud, used to identify the user, – configuration - computational specification for each virtual machine that is being created as part of a cluster (in terms of computational power, storage and memory), and – key - SSH key which will be used by the user to connect to the virtual machine.

After the virtual machines are created, the API call - GET: resource/listresources can be used to retrieve the IP addresses of the computational resources that were created. This API call has the following required parameters:

– userToken - used to identify which virtual machines are created for the given user token, and then returns the IP addresses of these computational resources, and

70 70 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

– configuration - one user may have different number of virtual machines created on the cloud. Each virtual machine is created based on a given con- figuration; this parameter is used to filter the machines which IP addresses will be returned by the API call.

When the computational resources created by the user are not needed, the API call - DELETE: resource/deleteresource will delete all the resources, based on the required parameter: userToken.

4.3 PostgreSQL database design

PostgreSQL is used as the main database of the API, storing configuration data about the clusters of virtual machines and also, keeping track of the user’s cloud resources based on their tokens. On Fig. 2 is presented an overview of the database design. The VM CONFIGURATION table is used to store data about different clus- ter setups. For example, here we keep data on how we need to setup all the virtual machines in a given cluster when the configuration of these virtual machines is based on Hadoop. This allows us to define different setups in the future, so that the API will be able to provision virtual machines in the cluster based on the provided data in the table. The USER TOKENS table is used to store each token that is sent by the user, while also tracking down the number of the resources that are being used by the user. The RESOURCE DETAILS table references the USER TOKENS table, while also storing data about the virtual machines that are created for the user. This is done to lower the load of the cloud; if we don’t store this data in the database and then retrieve it when needed, we would have to query the cloud for each user to get the details about each virtual machine based on user’s request. The integrity of this data is maintained based on each action of the user, meaning that when the user removes the cluster or creates a new one, all the data stored in the database is also removed or new data is added.

5 Hadoop deployment with Puppet

In this section we describe the process of automating the deployment of Hadoop cluster using the Puppet configuration management tool. First, in Subsection 5.1 we give short introduction of Hadoop, then in Subsection 5.2 we give short introduction of Puppet and finally, in Subsection 5.3 we provide the details of the deployment process.

5.1 Hadoop

Hadoop [1] is an open source software framework for large scale distributed data storage and data analyses [13]. The framework is designed to be able to scale from single machine to thousands of machines, where each machine offers local

71 71 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

VM_CONFIGURATION USER_TOKENS

PK configuration_id PK user_token

virtual_organization number_of_resources

virtual_machine_template

virtual_machine_template_size RESOURCE_DETAILS

virtual_machine_memory PK resource_id

virtual_machine_cpu FK user_token_id

number_virtual_machines resource_url

resource_ip

vm_type

virtual_organization

Fig. 2. vLab API database design. storage and computation. Hadoop follows the master-slave architecture, where a cluster is composed of one NameNode (the master) and one or more DataNodes (the slaves). The NameNode manages the file system namespace and regulates access to files by clients, and the DataNodes manage storage and computation. The main goals of Hadoop are quick detection of faults and automatic recov- ery from them, high throughput of data access rather than low latency of data access, support for large files in the order of terabytes and support for tens of millions of files in a single instance [2]. The framework includes the following modules: – Hadoop Common - Common utilities that support the other modules, – Hadoop Distributed File System (HDFS) -Adistributedfilesystem that provides high-throughput access, – Hadoop YARN - A framework for job scheduling and cluster resource management, and – Hadoop MapReduce - Implementation of the MapReduce programming model for processing large data sets.

5.2 Puppet Puppet [7] is a configuration management tool which is written in the Ruby [10] programming language and allows us to control and automate the configura- tion of all elements comprising an IT system, such as hosts, installed software,

72 72 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

Fig. 3. vLab system - web interface overview. users, running services, configuration files, scheduled tasks, storage, monitoring and security. Through Puppet, we can describe the system desired end state either using Ruby or a Ruby domain specific language (DSL). Puppet based infrastructure generally consists of a master server with agents installed on each client/node (master-slave architecture).

5.3 Automatic Deployment of Hadoop With Puppet

Deploying Hadoop consists of unpacking the desired version of Hadoop on all of the machines in the cluster in the desired place. Then, setting up the con- figuration files to include the addresses of the NameNode (the master), andthe addresses of the DataNodes (the slaves), as well as some additional configuration parameters. Then, we need to enable communication without password between the NameNode and all of the DataNodes, which include a creation of a special user for Hadoop. Finally, we need to format HDFS and start Hadoop from the NameNode. To be able to automate this, we have developed a Puppet module which is composed of multiple Puppet classes that are supposed to deploy Hadoop on all of the machines in the cluster. One of the challenges that we faced was synchronization between all of the Puppet agents. In order to start Hadoop,we have to first deploy Hadoop on all of the nodes, and then start it. But, because Puppet catalogs are executed on Puppet’s agents independently, the NameNode Puppet agent doesn’t know whether Hadoop has been deployed on all of the other servers. To achieve this synchronization, we have used PuppetDB [9] with Facter [8], and through PuppetDB they are accessible to all of the Puppet agents. The NameNode Puppet agent is reading these facts and as soon as all of the DataNodes are set up, it continues with its execution, which includes formating HDFS and starting the cluster.

6 Use case application

As part of developing and contribution to the EGI project, we have developed web application which allows the data scientists to create virtual laboratory

73 73 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

Fig. 4. vLab system - available clusters.

Fig. 5. vLab system - IP addresses from the machines in the cluster.

environments using Hadoop, while effectively managing its instances. Users are authenticated with x509 certificates that are used in EGI services.

On Fig. 3 is displayed the default interface of the application. On Fig. 4 is given the option for selection of virtual laboratories provided by the web application. When the type of virtual laboratory is chosen, the application will ask for the public SSH key which will be used for signing in the created virtual machines. The users are able to choose the desired scale of the cluster in the cloud. Finally, after successful provision of machines, the web application is providing the IP addresses of every instance part of the created cluster, as shown on Fig. 5. These IP addresses are public, which means the users can sign in from every device which has Internet connectivity. After finishing with their work, the cluser can be shut down with graceful poweroff signal sent to the interface of OpenNebula. Then, the virtual machines are terminated and get deleted from the list of available machines.

74 74 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

7 Conclusion and future work

In this paper we presented our service-oriented architecture for managing virtual laboratories - Virtual Laboratory System (vLab), hosted on the EGI Federated Cloud (FedCloud). The virtual machines created by the API provide tools for data processing and analysis. The details of the API were provided, along with the Puppet configuration, which provides automation in the process of deploy- ment of the Hadoop machines. The EGI Federated Cloud structure was also provided. In future we are planning to conduct researches where we would test if the Virtual Laboratory System (vLab) is capable of being deployed on other cloud providers like Google Cloud Platform, Elastic Compute Cloud and Microsoft Azure, and if it is not, what changes would be required to make the system cross-platform. Also, we are planning to extend our service-oriented architecture to include other laboratories, for example: Apache Spark, and to provide more configuration options for all of the existing laboratories.

References

1. Apache: Hadoop documentation (2019), http://hadoop.apache.org/docs/current/ 2. Borthakur, D.: The hadoop distributed file system: Architecture anddesign(2008) 3.Foundation, T.E.: The egi federated cloud (2019), https://www.egi.eu/federation/egi-federated-cloud/ 4. Hendrix, V., Benjamin, D., Yao, Y.: Scientific cluster deployment and recovery– using puppet to simplify cluster management. In: Journal of Physics: Conference Series. vol. 396, p. 042027. IOP Publishing (2012) 5. Kimle, M.: jocci api client library (2019), https://github.com/Misenko/jOCCI-api 6. PostgreSQL: Postgresql documentation (2019), https://www.postgresql.org/docs/ 7. Puppet: Puppet documentation (2019), https://puppet.com/docs/ 8. Puppet: Puppet facter documentation (2019), https://docs.puppet.com/facter/ 9. Puppet: Puppetdb documentation (2019), https://docs.puppet.com/puppetdb/ 10. Ruby: Ruby documentation (2019), https://www.ruby- lang.org/en/documentation/ 11. Ruggieri, F., Andronico, G., Barbera, R., Matyska, L.: Introduction to the chain- reds project (objectives and achievements). In: e-Infrastructures for e-Sciences 2013 A CHAIN-REDS Workshop organised under the aegis of the European Commis- sion. vol. 199, p. 001. SISSA Medialab (2014) 12. Software, P.: Spring framework (2019), https://spring.io/ 13. Taylor, R.C.: An overview of the hadoop/mapreduce/hbase framework and its current applications in bioinformatics. In: BMC bioinformatics. vol. 11, p. S1. BioMed Central (2010)

75 75 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

Visualization of the nets of type of Zaremba−Halton constructed in generalized number system⋆

V. Dimitrievska Ristovska1, V. Grozdanov2, and Ts. Petrova2

1 Faculty of Computer Science and Engineering, University ”SS. Cyril and Methodius”, 16 Rugjer Boshkovikj str. 1000, Skopje, Macedonia [email protected] 2 Department of Mathematics, Faculty of Natural Sciences and Mathematics, South West University ”Neofit Rilski”, 66 Ivan Michailov str., 2700 Blagoevgrad, Bulgaria [email protected],[email protected]

Zκ,µ Abstract. In the present paper a class of two-dimensional nets B2,ν of type of Zaremba-Halton constructed in generalized B2−adic system is introduced. In order to show their very well uniform distribution, we made their visualization with mathematical software Mathematica.

In our paper ”On the (VilB2 ; α; γ)−diaphony of the nets of type of Zarem- ba - Halton constructed in generalized number system” (in preparation) Zκ,µ we constructed a class B2,ν of two-dimensional nets (throughout this paper the term net will denote finite sequence) of type of Zaremba-

Halton. Also, the (VilB2 ; α; γ)−diaphony which is based on using two- dimensional Vilenkin functions constructed in the same B2−adic system, Zκ,µ of the nets of the class B2,ν is investigated. The obtained results have theoretical character and treat to the influence of the parameter α to

the exact order of the (VilB2 ; α; γ)−diaphony of the nets from the class Zκ,µ B2,ν . The purpose of this paper is to present in extended form the visualization Zκ,µ of some concrete nets from the class B2,ν and to show the distribution of the points of these nets. In this sense the reader can understand the distribution properties of the considered nets. To construct netsofthe Zκ,µ class B2,ν we use the mathematical software Mathematica. Our interest for the importance of these types of sequences is implied from their usage in numerical integration, construction of random num- ber generators e.t.c.

Keywords: Nets of type of Zaremba-Halton · Visualization.

1 Introduction

Let s 1 be a fixed integer which will denote the dimension throughout the paper.≥ The functions of some classes of complete orthonormal systems overthe

⋆ Supported by Faculty of Computer Science and Engineering at ”Ss Cyril and Methodius” Univ. in Skopje

76 76 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

s dimensional unit cube [0, 1)s are used with a big success as an analytical tool for− investigation of the distribution of sequences and nets. Also, thesesystems stand on the basis of the definitions of some quantitative measures which show the quality of the distribution of sequences and nets. We will remind the def- initions of the functions of some complete orthonormal function systems.The first example is the trigonometric system. For an arbitrary integer k the func- 2πikx tion ek :[0, 1) C is defined as ek(x)=e ,x [0, 1). For an arbi- → s ∈ s trary vector k =(k1,...,ks) Z the function ek :[0, 1) C is defined s ∈ → s as ek(x)= ek (xj), x =(x ,...,xs) [0, 1) . The set s = ek(x):k j 1 ∈ T { ∈ j=1 ! Zs, x [0, 1)s is called trigonometric function system. ∈ } Following Larcher, Niederreiter and Schmid [9] we will present the concept of the Walsh functions over finite groups. For this purpose let m 1 be a given ≥ integer and b1,b2,...,bm :1 l mbl 2 be a set of fixed integers. For { Z ≤ ≤ ≥ } 1 l m let bl = 0, 1,...,bl 1 the operation bl is the addition modulus ≤ ≤ Z { − } ⊕ bl and ( bl , bl ) is the discrete cyclic group of order bl. ⊕Z Z Let G = b1 ... b and the operation G =( b1 ,..., b ) is defined as: × × m ⊕ ⊕ ⊕ m for arbitrary elements g, y G, where g =(g ,...,gm) and y =(y ,...,ym), ∈ 1 1 let g G y =(g b1 y ,...,gm b ym). Then, the couple (G, G)isafinite ⊕ 1 ⊕ 1 ⊕ m ⊕ group of order b = b1 ...bm. For arbitrary g, y G of the above form the character function χg(y)is m∈ glyl defined as χg(y)= exp 2πi . Let ϕ : 0, 1,...,b 1 G be an bl { − } → l=1 " # arbitrary bijection which! satisfies the condition ϕ(0) = 0. For an arbitrary integer k 0 and a real x [0, 1) with the b adic rep- ν ≥ ∞ ∈ − i −i−1 resentations k = kib and x = xib , where ki,xi 0, 1,...,b 1 , ∈ { − } i=0 i=0 $ $ kν = 0 and for infinitely many of ixi = b 1, the k th Walsh function over ̸ ̸ − − the group G with respect to the bijection ϕ G,ϕwalk :[0, 1) C is defined as ν →

G,ϕwalk(x)= χϕ(ki)(ϕ(xi)). i=0 !N N Ns Let us denote 0 = 0 . For an arbitrary vector k =(k1,...,ks) 0 the ∪{ } s ∈ s function G,ϕwalk :[0, 1) C is defined as G,ϕwalk(x)= G,ϕwalk (xj), x = → j j=1 s N!s s (x1,...,xs) [0, 1) . The set G,ϕ = G,ϕwalk(x):k 0, x [0, 1) is called a system of Walsh∈ functions overW the group{ G with respect∈ to∈ the bijection} ϕ. In the case when m =1,G= Zb and ϕ =idistheidentitybetweenthe Z sets 0, 1,...,b 1 and b the obtained function system Zb,id is the system (b){ of the Walsh− } functions of order b which was proposedW by Chrestenson [1]. W If b =2then,thesystem Z2,id is the original system (2) of the Walsh [13] functions. W W The numerical measures for the irregularity of the distribution of sequences and nets in [0, 1)s give us the order of the inevitable deviation of a concrete

77 77 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

distribution from the ideal distribution. Generally, they are different kinds of the discrepancy and the diaphony. So, let ξN = x0, x1,...,xN−1 be an arbitrary net of N 1 points in [0, 1)s. { } ≥ s For an arbitrary vector x =(x ,...,xs) [0, 1) let us denote [0, x)= 1 ∈ [0,x ) ... [0,xs) and A(ξN ;[0, x)) = # xn :0 n N 1, xn [0, x) . 1 × × { ≤ ≤ − ∈ } The L discrepancy of the net ξN is defined as 2− 1 2 2 s A(ξN ;[0, x)) T (ξN )= xj dx1 ...dxs . ⎛ s ) N − ) ⎞ ([0,1] ) j=1 ) ) ! ) ⎜ ) ) ⎟ ⎝ ) ) ⎠ In 1976 Zinterhof [14] introduced) the notion of) the so - called classical di- aphony. So, the diaphony of the net ξN is defined as

1 2 N 2 1 −1 F ( ; ξ )= R−2(k) e (x ) , s N N k n T ⎛ s ) ) ⎞ k∈Z \{0} ) n=0 ) $ ) $ ) ⎝ ) ) ⎠ ) ) s s where for each vector k =(k ,...,ks) Z the coefficient R(k)= R(kj) and 1 ∈ j=1 for an arbitrary integer k !

1, if k =0, R(k)= k , if k =0. - | | ̸ In order to study sequences and nets constructed in b adic system, in the last twenty years, some new versions of the diaphony was defined.− These kinds of the diaphony are based on using complete orthonormal function systems constructed also in base b. In 2001 Grozdanov and Stoilova [5] used the system (b) of the Walsh function to introduce the concept the so-called b adic diaphony.W In − 2003 Grozdanov, Nikolova and Stoilova [4] used the system G,ϕ of the Walsh functions over the group G with respect to the bijection ϕWto introduce the concept of the so-called generalized b adic diaphony. So, the generalized b adic − − diaphony of the net ξN is defined as

1 2 N 2 1 1 −1 F ( G,ϕ; ξN )= ρ(k) G,ϕwalk(xn) , W ⎛(b + 1)s 1 N ⎞ Ns ) n ) − k∈ 0\{0} ) =0 ) $ ) $ ) ⎝ ) ) ⎠ ) s ) s where for a vector k =(k ,...,ks) N the coefficient ρ(k)= ρ(kj) and for 1 ∈ 0 j=1 an arbitrary non-negative integer k !

1, if k =0, ρ(k)= b−2g, if bg k bg+1,g 0,g Z. - ≤ ≤ ≥ ∈

78 78 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

In 1954 Roth [11] obtained a general lower bound of the L2 discrepancy of − s arbitrary net. He prove that for any net ξN composed of N points in [0, 1) the lower bound s−1 (log N) 2 T (ξ ) >C(s) (1) N N holds, where C(s) is a positive constant depending only on the dimension s. In 1986 Proinov [10] obtained a general lower bound of the diaphony of s arbitrary net. So, for any net ξN composed of N points in [0, 1) the lower bound s−1 (log N) 2 F ( s; ξN ) > α(s) (2) T N holds, where α(s) is a positive constant depending only on the dimension s. Cristea and Pillichshammer [2] obtained a general lower bound of the b adic − s diaphony of arbitrary net. So, for any net ξN composed of N points in [0, 1) the lower bound s−1 (log N) 2 F ( (b); ξN ) C(b, s) (3) W ≥ N holds, where C(b, s) is a positive constant depending on the base b and the dimension s. About the exactness of the lower bound (1) of the L2 discrepancy of two - dimensional nets the following results are obtained. Let b −2 and ν > 0 be fixed integers. ≥ Following Van der Corput [12] and Halton [6] for an arbitrary integer i ν−1 ν j such that 0 i b 1 with the b adic representation i = ijb we put ≤ ≤ − − j=0 $ ν−1 i p (i)= i b−j−1. Let us denote η (i)= and to construct the net b,ν j b,ν bν j=0 $ ν Rb,ν = (ηb,ν (i),pb,ν (i)) : 0 i b 1 . The net R2,ν was introduced and studied by{ Roth [11]. Hammersley≤ ≤ [8] used− } sequences of Van der Corput Halton constructed in different bases to construct a class of s dimensional nets.− − Roth proved that the L2 discrepancy T (R2,ν ) of the net R2,ν have an exact log N − order , where N =2ν . According to the lower bound (1) the order O N " # log N is not the best possible. O N " # There exist two important techniques to improve the order of the L2 discrepancy of arbitrary net. They are a digital shift and a symmetrization− of the points of the net. The first approach was realized in 1969 by Halton and Zaremba [7]. They used the original net of Van der Corput p ,ν (i)= { 2 0,i i ...iν : ij 0, 1 , 0 j ν 1 and changed the digits ij that 0 1 −1 ∈ { } ≤ ≤ − } stay on the even positions with the digits 1 ij. In this way the net z ,ν (i)= − { 2 0, (1 i0)i1(1 i2) ... : ij 0, 1 , 0 j ν 1 is obtained. The net − − ∈ { }ν ≤ ≤ − } Z ,ν = (η ,ν (i),z ,ν (i)) : 0 i 2 1 which now usually is called net of 2 { 2 2 ≤ ≤ − }

79 79 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

Zaremba Halton is constructed. It is shown that the L discrepancy of the net − 2− √log N ν Z ,ν has an order ,N=2 , which of course is the best possible. 2 O N " # In 1998 Xiao proved the exactness of the lower bound (2) of the diaphony for dimension s =2. It is shown that the diaphony of the nets of Roth Rb,ν and Zaremba Halton Zb,ν , both constructed in arbitrary base b 2, have an exact − ≥ √log N order ,N= bν . O N " # Grozdanov and Stiolova [5] proved the exactness of the lower bound (3) of the b adic diaphony for dimension s =2. They proved that the b adic diaphony − − √log N ν of the nets Rb,ν and Zb,ν have an exact order ,N= b . O N " # p,q Grozdanov [3] constructed the so-called net of Zaremba Halton G,ϕZb,ν over the group G with respect to the bijection ϕ and obtained− the exact order √log N p,q of the generalized b adic diaphony of the nets Rb,ν and G,ϕZ . O N − b,ν " #

2 The nets of type of Zaremba-Halton constructed in generalized system

We will remind the concept of the so-called generalized number system or a system with variable bases. Let the sequence of integers B = b0,b1,b2,...: bi 2 for i 0 be given. The so-called generalized powers are defined{ by the next≥ ≥ } recursive equalities. We put B0 = 1 and for j 0 define Bj+1 = Bj.bj. Then, an arbitrary integer k 0 and a real x [0, 1)≥ in the generalized B-adic number system can be represented≥ of the form∈

ν ∞ xi k = kiBi and x = , Bi i=0 i=0 +1 $ $ where for i 0,ki,xi 0, 1,...,bi 1 ,kν = 0 and for infinitely many i we ≥ ∈ { − } ̸ have xi = bi 1. ̸ − For arbitrary reals x, y [0, 1) which in the B adic system have the rep- ∈ ∞ −∞ x y resentations of the form x = i and y = i , where for i 0 Bi Bi ≥ i=0 +1 i=0 +1 $ $ xi,yi 0, 1,...,bi 1 and for infinitely many i we have xi,yi = bi 1, we define∈ the{ operation− } ̸ −

∞ [0,1) xi + yi(mod bi) x B y = (mod 1). ⊕ Bi .i=0 +1 / $ In the next definition we will present the concept of the sequence of Van der Corput constructed in the generalized B adic system. −

80 80 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

Definition 1 For an arbitrary integer i 0 which in the generalized B adic ≥ − system has the representation i = imBm + im Bm + + i B , where for −1 −1 ··· 0 0 0 j mij 0, 1,...,bj 1 and im =0, we put ≤ ≤ ∈ { − } ̸

i0 i1 im pB(i)= + + ...+ . B1 B2 Bm+1

The sequence ωB =(pB(i))i≥0 is called a sequence of Van der Corput constructed in the generalized system B.

The sequence ωB was investigated by many several authors and used for different purposes. Now we explain the constructive principle of two-dimensional nets (intro- duced in our mentioned paper) of the type of Zaremba-Halton constructed in generalized system. (1) (1) (1) (1) (1) (1) For this purpose let B = b ,b ,...,b ,bν ,b ,...: b 2 for j 1 { 0 1 ν−1 ν+1 j ≥ ≥ 0 be a given sequence of bases. By using the sequence B1 we will define the } (2) (2) (2) (2) (2) (2) sequence B2 = b0 ,b1 ,...,bν−1,bν ,bν+1,... : bj 2 for j 0 in the { (2) (1) (2) (1) (2)≥ (1) ≥ } following manner. We put b0 = bν−1,b1 = bν−2,...,bν−1 = b0 and the bases (2) (2) bν ,b ,... can be chosen arbitrary. Let us denote =(B ,B ). ν+1 B2 1 2 Let us assume that the sequences B1 and B2 of bases are limited from above, i. e. there exists a constant M 2 such that for each j 0 and τ =1, 2wehave (τ) ≥ ≥ bj M. ≤ (1) (1) (1) (1) Let ν 1 be an arbitrary and fixed integer. We have that Bν = b0 .b1 ...bν−1 (2) ≥ (2) (2) (2) (1) (1) (1) (1) (2) and Bν = b0 .b1 ...bν−1 = bν−1.bν−2 ...b0 , so Bν = Bν . Let us denote (1) (2) i Bν = Bν = Bν . For 0 i Bν 1 let us define the quantity ην (i)= . ≤ ≤ − Bν (1) Let κ =0, κν κν ...κ , where for 0 j ν 1 κν j 0, 1,...,b −1 −1 0 ≤ ≤ − −1− ∈ { j − 1 be a fixed B1 adic rational number. } − (2) Let µ =0,µ0µ1 ...µν−1, where for 0 j ν 1 µj 0, 1,...,bj 1 be a fixed B adic rational number. ≤ ≤ − ∈ { − } 2− The different choice of the parameters κ and µ permits us to obtain a class of two-dimensional nets with similar construction and distribution properties of the points in the plane. For an arbitrary i such that 0 i Bν 1 let us denote ≤ ≤ − κ [0,1) µ [0,1) η (i)=ην (i) κ and z (i)=pB2 (i) µ. ν ⊕B1 ν ⊕B2 Definition 2 Let ν 1 be an arbitrary and fixed integer. Let κ and µ be as above. The two-dimensional≥ net

κ,µ κ µ Z = (η (i),z (i)) : 0 i Bν 1 B2,ν { ν ν ≤ ≤ − } we will call a net of type of Zaremba-Halton constructed in the generalized adic system. B2−

81 81 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

In our mentioned paper we investigated and proved ”very well” uniform distribution of these nets. Our interest for the importance of these types of sequences is implied from their usage in numerical integration, construction of random number generators e.t.c.

3 Visualization results for nets of type of Zaremba-Halton

We will use the above mathematical model to present a version of the program in Mathematica which can construct arbitrary net of type of Zaremba - Halton, for the chosen parameters. In our work we will confine only to present a visualization of some concrete κ,µ nets from the class ZB2,ν and to show the distribution of the points of these nets. κ,µ Some theoretical investigations of the nets from the class ZB2,ν have been realized in other paper of the authors. Our code of the program, for some choices of parameters, is presented below: ( parameters ) nu∗ = 3; mi = ∗2, 1, 3 ;ni= 1, 0, 1 ; b2 [ 0 ] = 5; b2{ [ 1 ] = 2;} b2 [ 2 ] ={ 3; b2v =} 5, 2, 3 ; b1 [ 0 ] = 3; b1 [ 1 ] = 2; b1 [ 2 ] = 5; b1v = {3, 2, 5} ; B1 [ 0 ] = B2 [ 0 ] = 1 ; niza = ; { } ( end parameters ) {} ∗ ∗ Do [ B1 [ j ] = B1 [ j 1] b1 [ j 1]; B2 [ j ] = B2 [ j − 1]∗ b2 [ j − 1] , j, nu ] − ∗ − { } For [ i = 0 , i < b2 [ 2] , i++, For [ j = 0 , j < b2 [ 1] , j++, For [ k = 0 , k < b2 [ 0] , k++, pc = i, j, k ; { } p=Reverse[pc]; z=Mod[mi+p,b2v]; zv = Sum[ z [ [ l ]]/B2[ l ] , l, nu ]; et = Mod[ ni + pc , b1v ];{ } etv = Sum[et[[l]]/B1[l], l, nu ]; AppendTo [ niza , etv , zv ] { } { } ]]] We illustrate the work of the code by some examples. In the first example we make the following choice of the parameters: ν =3, (1) (1) (1) (2) (2) (2) b0 =3,b1 =2,b2 =5,b0 =5,b1 =2,b2 =3, κ =0.101 and µ =0.213. The distribution of the points of this net is given in Fig. 1.

82 82 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

0.8

0.6

0.4

0.2

0.2 0.4 0.6 0.8

(1) (1) (1) Fig. 1. b0 =3,b1 =2,b2 =5, κ =0.101 and µ =0.213.

κ,µ The points of the net ZB2,ν are:

11 15 12 21 13 27 14 3 10 9 16 12 Zκ,µ = , , , , , , , , , , , , B2,ν !" 30 30 # " 30 30 # " 30 30 # " 30 30 # " 30 30 # " 30 30 #

17 18 18 24 19 0 15 6 21 16 22 22 , , , , , , , , , , , , " 30 30 # " 30 30 # " 30 30 # " 30 30 # " 30 30 # " 30 30 # 23 28 24 4 20 10 26 13 27 19 28 25 29 1 , , , , , , , , , , , , , , " 30 30 # " 30 30 # " 30 30 # " 30 30 # " 30 30 # " 30 30 # " 30 30 # 25 7 1 17 2 23 3 29 4 5 0 11 , , , , , , , , , , , , " 30 30 # " 30 30 # " 30 30 # " 30 30 # " 30 30 # " 30 30 # 6 14 7 20 8 26 9 2 5 8 , , , , , , , , , . " 30 30 # " 30 30 # " 30 30 # " 30 30 # " 30 30 #$

1.0

0.8

0.6

0.4

0.2

0.2 0.4 0.6 0.8 1.0

(1) (1) (1) (1) Fig. 2. b0 =2,b1 =9,b2 =7,b3 =16, κ =0.180F and µ =0.F 101.

83 83 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

(1) (1) (1) (1) In the second example we take ν =4,b0 =2,b1 =9,b2 =7,b3 =16, (2) (2) (2) (2) b0 =16,b1 =7,b2 =9,b3 =2, κ =0.180F and µ =0.F 101, where F is the symbol for the number 15 in the number system with base 16. The distribution of the points of the second net is given in Fig.2. In the third example (see Fig.3) we make the following choice of the parameters: (1) (1) (1) (1) ν =4,b0 =2,b1 =3,b2 =7,b3 =11.

1.0

0.8

0.6

0.4

0.2

0.2 0.4 0.6 0.8 1.0

(1) (1) (1) (1) Fig. 3. b0 =2,b1 =3,b2 =7,b3 =11.

In the next example (see Fig. 4) we make the following choice of the parameters: (1) (1) (1) (1) (1) (1) ν =6,b0 =2,b1 =3,b2 =2,b3 =5,b4 =5,b5 =3.

1.0

0.8

0.6

0.4

0.2

0.2 0.4 0.6 0.8 1.0

(1) (1) (1) (1) (1) (1) Fig. 4. b0 =2,b1 =3,b2 =2,b3 =5,b4 =5,b5 =3.

84 84 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

(1) (1) Some more examples with ν =6, are presented on the Fig. 5 ( b0 =2,b1 =3, (1) (1) (1) (1) (1) (1) (1) (1) b2 =4,b3 =5,b4 =6,b5 =8)andFig.6(b0 =13,b1 =11,b2 =7,b3 =5, (1) (1) b4 =3,b5 =2).

1.0

0.8

0.6

0.4

0.2

0.2 0.4 0.6 0.8 1.0

(1) (1) (1) (1) (1) (1) Fig. 5. b0 =2,b1 =3,b2 =4,b3 =5,b4 =6,b5 =8.

1.0

0.8

0.6

0.4

0.2

0.2 0.4 0.6 0.8 1.0

(1) (1) (1) (1) (1) (1) Fig. 6. b0 =13,b1 =11,b2 =7,b3 =5,b4 =3,b5 =2.

85 85 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

4 Conclusion

We made visualizations with mathematical software Mathematica of the two-dimensional Zκ,µ B − nets B2,ν of type of Zaremba-Halton constructed in generalized 2 adic system. We show some examples with different number of points, different bases and dif- ferent shift - vectors. All of the realized experiments with graphical representations, where the number of points in the nets was between 24 and 2 000 000, very well confirm the theoretical results, about uniform distribution of the points in the introduced net, obtained in our previous mentioned paper.

References

1. Chrestenson, H. E.: A class of generalized Walsh functions. Pacific J. Math. 5,17–31 (1955). 2. Cristea, L. L., Pillichshammer, F.: A lower bound on the b−adic diaphony. Ren. Math. Appl. 27 (2), 147 –153 (2007). 3. Grozdanov, V.: The generalized b−adic diaphony of the net of Zaremba - Halton over finite abelian groups. Comp. Rendus Akad. Bulgare Sci. 57 (6), 43–48 (2004). 4. Grozdanov, V., Nikolova, E. V., Stoilova, S. S.: Generalized b−adic diaphony. Comp. Rendus Akad. Bulgare Sci. 56 (4), 137- -142 (2003). 5. Grozdanov, V., Stoilova, S. S.: On the theory of b−adic diaphony. Comp. Rendus Akad. Bulgare Sci. 54 (3), 31–34 (2001). 6. Halton, J. H.: On the efficiency of certain quasi-random sequences of points in evaluating multi-dimensional integrals. Numer Math. 2,84–90(1960). 7. Halton, J. H., Zaremba, S. K.: The extreme and L2 discrepancy of some plane sets. Monatsh. Math. 73,316–328(1969). 8. Hammersley, J. M.: Monte Carlo methods for solving multivariable problems. Ann. New York Acad. Sci. 86,(3),844–874(1960). 9. Larcher, G., Niederreiter, H., Schmid W. Ch.: Digital nets and sequences constructed over finite rings and their applications to quasi-Monte Carlo integration. Monatsh. Math. 121 (3), 231–253 (1996). 10. Proinov, P. D.: On irregularities of distribution. Comp. Rendus Akad. Bulgare Sci. 39 (9), 31- -34 (1986). 11. Roth, K.: On the irregularities of distribution. Mathematika 1,(2),73–79(1954). 12. Van der Corput J. G., Verteilungsfunktionen I. Proc. Akad. Wetensch. Amsterdam 38 (8), 813–821 (1935). 13. Walsh, J. L.: A closed set of normal orthogonal functions. Amer. J. Math. 45,5–24 (1923). 14. Zinterhof, P.: Uber¨ einige Absch¨atzungen bei der Approximation von Funktionen mit Gleichverteilungsmethoden. S. B. Akad. Wiss., math.-naturw. Klasse, Abt. II 185, 121–132 (1976).

86 86 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

Minimizing Total Weight of Feedback Arcs in Flow Graphs

Marija Mihova, Ilinka Ivanoska, Kristijan Trifunovski, Kire Trivodaliev

Ss. Cyril and Methodius University, Faculty of Computer Science and Engineering, 1000 Skopje, Macedonia {marija.mihova,ilinka.ivanoska,kire.trivodaliev}@finki.ukim.mk, [email protected]

Abstract. The problem of determining the minimal weight feedback edge set in an oriented weighted graph when the graph represents a network flow is the fo- cus of this paper. The proposed solution is especially interesting since it can be directly applied in solving the linearization of genome sequence graphs (GSG) which represent a set of individual haploid reference genomes as paths in a sin- gle graph and can find frequent human genetic variations, including transloca- tions, inversions, deletions, and insertions. The linearization is essential for the efficiency of storing and traversal as well as visualization of the graph. In this paper properties of the flow function are used to obtain some features of the op- timal ordering of nodes when the weight of the feedback edge set is minimal.

Keywords: Two-terminal flow network × Feedback arcs × Linearization × Flow graph.

1 Introduction

In graph theory, a feedback arc set, or feedback edge set in an unweighted directed graph is a set of edges which, when removed from the graph, leave a directed acyclic graph, or DAG. In fact, it is a set containing at least one edge of every cycle in the graph. Finding a minimal set with this property is a key step in layered graph drawing [1]. In weighted graph this operation can be defined as a minimization of the total weight of feedback arcs, which is the minimum weight of the edges such that when removed from the graph, a DAG is obtained. This problem has various applications, most prominent ones being tournaments graphs [5] and genome sequence graph [3]. The decision version of the problem is NP-complete and one of the Richard M. Karp's NP-complete problems [6]. There are some algorithms whose running time is a fixed polynomial in the size of the input graph, but exponential in the number of edges in the feedback arc set [2] or dimension of a cycle space [4], although some special cas- es of tournament graphs can be solved in polynomial time. The focus of application in this paper is finding the minimal feedback edge set of a Genome Sequence Graph (GSG) [9], a graph used to represent the human genetic variation into the reference human genome. Each node of a GSG represents a single

87 87 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

DNA base that occurs at an orthologous locus in one or more of the haploid genomes represented. Each arc corresponds to an adjacency that occurs between consecutive instances of bases in the represented genomes, and it is directed according to the de- fault strand direction of the DNA sequence. The weight to each arc is added in order to emphasize the importance of an arc in typical applications running on the graph, for example number of times that the arc is traversed in the reference genomes used to build the graph. The representation of GSG has importance for the efficiency and the effectiveness of the operations such as access, traversal and visualization, all of which depend on the GSG structure/representation, and the best GSGs are ones that have their nodes ordered in a straight line, a structure obtained in a process known as line- arization of the graph. Usually the linearization of a sequence graph aims to lessen the total weight of all feedback arcs, i.e. the weighted feedback, as much as possible. On the other hand, a GSG graph can be easily modified into a graph that represent a flow, which means that the inflow into each node representing a genome is equal to the out flow out of any node representing a genome. This characteristic gives specific fea- tures to the minimal weighted feedback edge ordering, so pointing and distinguishing some of them is the main interest in this paper. The rest of this paper is organized as follows. The second section presents the terminology and definitions from network flow theory used in the research. The next section defines the linearization of Genome Sequence Graph as a network flow prob- lem. Section 4 presents our findings in terms of specific features the optimal ordering of nodes has when the weight of the feedback edge set is minimal. The paper is con- cluded in the fifth section.

2 Flow Function and Minimal Path Vectors

A flow in an undirected network G(V, E, c) is a function f : V × V®R+È{0} that satis- fies the following three constraints:

1. Capacity constraint: 0 ≤ f (u, v) ≤ c({u, v}), for all u, vÎV, i.e., the flow of an edge cannot exceed its capacity.

2. Flow conservation: for each vÎV,

0, % ∉ {4, 5}

!(#, %) − !(%, #) = ∑+∈- !(*, %) − ∑/∈- !(%, .) = 0 |!|, % = 4 , (1) −|!|, % = 5

where |f | is the value of the flow. In other words, the total flow entering a node v, !(#, %), must be equal to the total flow leaving node v, !(%, #), ∀v ÎV\{s, t}; the flow leaving s and the flow entering t is equal to the value of the flow.

3. For all vertices u, vÎV, if f(u, v) > 0, then f(v, u) =0. In other words, each flow uses a given edge only in one direction. It is assumed that if there is no edge {u, v}, i.e. {u, v}ÏE, then f(u, v) = 0.

88 88 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

In the rest of the paper we will use the term flow graph for a weighted directed graph with flow as a weight function. Here we give some additional definitions that help define the problem and explain the algorithm.

Definition 1. Let G(V, E, c) be a two-terminal flow network. For a flow f, we define flow vector :<<<;⃗ induced by f by

xi = f (u, v) = f (ei) , (2) where ei ={u, v}.

Definition 2. Given a two-terminal undirected flow network G(V, E, c), the vector :⃗ is a path vector to level d, d-P, if and only if there is a flow f with a value |f | = d, such that for all edges ei = {u, v}, f(u, v) £ xi. In other words, :⃗ is a path vector to level d if and only if a flow d may be delivered in the two-terminal network G(V, E, :⃗).

Definition 3. The state vector :⃗ is a minimal path vector to level d, d-MP, if and only if the two-terminal flow network G(V, E, :⃗) has a maximum flow d, and for each >⃗ ≤ :⃗, the two-terminal flow network G(V, E, >⃗) has a maximum flow less or equal to d.

Theorem 1. The state vector :⃗ is a d-MP for the two-terminal flow network G(V, E, c) iff :⃗ corresponds to a flow function with flow d and the corresponding graph G(V, E, :⃗) has no cycles [7, 8].

Corollary 1. Each 0-MP is a sum of simple cycles <@<

Corollary 2. If :⃗ is a flow vector for flow d and <@<

>⃗ = :⃗ − ∑A @<<⃗ is a d-MP.

Example 1. Let us consider the graph given on Fig. 1 a). This graph contains cycles, and one of those cycles is . The flow through that cycle is equal to 2. We can take out that flow from the graph, thus obtain the graph presented on Fig. 2 b). The obtained graph contains a cycle again, , and by removing this cycle we will obtain the graph given on Fig. 2. If we order the nodes as , the graph on Fig 2 has not forward edges. Moreover, if we order the nodes of the graph on Fig 1 a), the total weight of the feed- back edges will be 3, which is the minimal possible weighted feedback of this graph.

89 89 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

a) b) Fig. 1. The graph from Example 1. a) is a cycle with capacity 2, that should be re- moved from the graph. b) The graph obtained from a).

Fig. 2. Acyclic graph obtained by removing cycles from the graph shown on Fig 1.

3 Linearization of Genome Sequence Graphs

Genome Sequence Graph (GSG), Fig. 3, is a representation of DNA sequences that helps to incorporate human genetic variation into the reference human genome (trans- locations, inversions, deletions, and insertions.) [9].

Fig. 3. Genome sequence graph for CATCGCT, CATCGT, CAGCGAT and CATCGAGAGAGCT.

One of the most important operations in GSGs is graph linearization, essential both for efficiency of storage and access, as well as for natural graph visualization and compatibility with other tools. The linearization of a sequence graph aims to lessen the total weight of all feedback arcs, i.e. the weighted feedback, as much as possible. In the rest of the paper we will denote linear ordered set of nodes by . The GSG graph can be modified to a flow function in network in the following manner. First, we add special (artificial) source and a special (artificial) sink to the set of nodes. For each node that represents a starting DNA base, we add an arc from the

90 90 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

source to it, with a weight equal to the number of genomes that start with it. For each node that represents an ending DNA base, we add a weighted arc from that node to the sink. The weight of that arc is equal to the number of genomes that end with that node, Fig. 4.

Fig. 4. Modification of the initial DNA graph by adding source and sink.

This way we obtain a graph whose weight function is in fact a flow function, be- cause the total flow out of the source is equal to the total flow entering the sink, and for all other nodes, total flow leaving a node is equal to the total flow entering a node [7, 8]. In fact, such a graph represents a network flow function with flow equal to the number of genome sequences. The max-flow, M, of this network is equal to the total flow out of the source, so any minimal path vector for level M, i.e. minimal flow function for flow M, is an undirected graph. The features of these minimal paths can help in minimizing the weighted feedback of the graph.

4 Total Weight Feedback Arcs and min-paths

The first main new theorem in this part says that if some ordering of the nodes corre- sponds to the minimal weight of feedback edges, then this ordering also corresponds to a topological sort of some minimal path.

Theorem 2. Let G(V, E, @) be a flow graph with flow d, where |V| = n, and let be an ordering of nodes with a minimal weight of feedback edges. Then there is a d-MP :⃗ for which is a topological sort of the graph G(V, E, :⃗).

Proof. Assume that there is an ordering of nodes with a minimal weight of feedback edges such that the flow in the graph after removing forward edges is d’< d. In the graph obtained from G by removing feedback edges, we can find some d’-MP :⃗. It is obvious that the graph G’(V, E, @-x) is a flow graph of level d – d’ and that any flow of level d – d’ uses a collection of feedback edges (vj1, vi1), (vj2, vi2), …, (vjk, vik), is a collection of feedback edges with weights d1, d2,…, d1k. Moreover, if we create a new graph by adding a source s’ and a sink t’ such that G’’(V’={{min{vim},…, max{vjm}}, E/ V’È{s’,t’}, @/V’È{s’,t’},), Where @/V’È{s’,t’} is: {c(uk,vl)| k

91 91 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

Lemma 1. Let G(V, E, c) be a flow graph where |V| = n, and let be nodes ordering with a minimal weight of feedback edges. If (vj, vi) is a feedback edge with weight d1, there is a flow d1 from vi to vj in the subgraph G1(V1={vi,…, vj}, E/V1, :⃗/V1,).

Proof. Suppose that the nodes ordered such that the weight of feedback edges is min- imal, but we have a feedback edge (vj, vi) with weight d1, and there is not a flow d1 from vi to vj in the subgraph G1(V1={vi,…, vj}, E/V1, :⃗/V1,). Then there is a cut that de- 1 i j ∑ composes V into sets A and B such that v ÎA, v ÎB, and +I∈J,+K∈L |:AH| < NO(|:AH| is the weight of the edge from *A to *H). If we swap the nodes of the set A and set B, retaining the order of the nodes inside each of them the same as before, the situation with the feedback edges become the following:

─ All edges inside the sets {v1,…, vi-1}, {vj+1,…, vn}, A and B remain in the same orientation, so we do not have any changes in weight of the feedback edges be- tween these edges. ─ All edges between {v1,…, vi-1} and A, {v1,…, vi-1} and B, {vj+1,…, vn} and A and {vj+1,…, vn} and B remain in the same orientation, so again we do not have any changes in weight of the feedback edges between these edges. ─ The between from A and B are turning in opposite direction, so because ∑ ∑ +I∈J,+K∈L |:AH| < NO, we will obtain feedback edges with weight +I∈J,+K∈L |:AH| which is lower than the weight d1 of the feedback edge we have before this trans- formation. Hence, we proved that we can obtain a graph with a smaller total weight of the feedback edges, which is in contradiction with our assumption that the starting order- ing have minimal weight of feedback edges.

Let us illustrate the proof of this theorem by following example:

Example 2. Let us consider the graph given on Fig 5. One random ordering of the nodes is given on the picture. The edge (d, a) is one feedback edge that lies on the cycle with capacity 1. Removing this edge, together with the nodes s and t, the residual graph has min cut (A={a, b, c}, B={d}). By swapping the nodes from the sets A and B, retaining the order of the nodes inside each of them the same as before, we will obtain the ordering illustrated on Fig 6. It is obvious that the new order has less weight of the feedback edges.

92 92 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

Fig. 5. The graph from Example 2. In this ordering the edge (d, a) is a feedback edge making the cycle with capacity 1.

Fig. 6. New linearization of the graph from Fig 5, obtained by swapping the sets A and B.

The new linearization is the best possible, since we cannot further reduce the weight of the feedback arcs. In fact, one feedback edge is (c, d), but the flow in the graph G’= ({d, a, b, c}, V/{d, a, b, c}) with source d and sink c is greater than 2. We have the same situation with the feedback edge (c, b), since the flow in the graph G’’= ({b, c}, V/{d, a, b, c}) with source b and sink c is greater than 1. Because we can remove these cycles from the graph, Fig 7., we may conclude that further improvement is not possi- ble.

Fig. 7. The graph obtained from the one from Fig 6 by removing all cycles.

The following lemma is generalization of the Lemma 1, and its proof is very simi- lar with the proof of Lemma 1, so we will leave it without proof, but we will illustrate it by an example.

Lemma 2. Let G(V, E, c) be GSG and let be nodes ordering with a mini- mal weight of feedback edges. If (vj1, vi1), (vj2, vi2), …, (vjk, vik), is a collection of feed- back edges with weights d1, d2,…, d1k, there is a flow d1+d2+…+d1k, from s’ to t’ in the graph G’(V’={{min{vim},…, max{vjm}}, E/ V’È{s’,t’}, @/V’È{s’,t’},), Where @/V’È{s’,t’} is: {c(uk,vl)| k

93 93 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

Example 3. The ordering of the graph on Fig 8 is such that the total weight of feed- back edges is 4. We modify it such that we delete the forward edges, add source s’ and sink t’, and add edges (s’, a) and (c, t’) with weight 3, and (s’, b) and (d, t’) with weight 3. The flow through the second graph is 3, i.e. smaller than the total weight of feedback edges. The cut is between nodes b and c, so we will swap sets {a, b} and {c, d}. The obtained order has minimal weight of the feedback edges, equal to 3.

a)

b)

c)

Fig. 8. The graph with weight 4 of the feedback edges and the subgraph induced from the for- ward edges, modified according to Lemma 2. The last figure illustrates the same graph after swapping sets {a, b} and {c, d}, leading to a graph with feedback edges weight equal to 3.

Note that the Lemma 2 does not provide sufficient conditions for minimal weight feedback edge, i.e. it is not proven that the ordering in which all collection of feed- back edges satisfied the condition of Lemma 2 is a minimal weight feedback edge, which is illustrated by Example 4.

Example 4. On the ordering on fig 9 a) we have 2 feedback edges, which means 3 possible collections: {(d, a)}, {(f, c)}, {(d, a), (f, c)}. The conclusions of the Lemmas 1 and 2 are satisfied, but we have more optimal ordering illustrated on Fig 9 b). In fact, following the approach of the Lemma 2, the graph G’ induced from the graph on Fig 9 a) is given on Fig 9 c). The characteristic property of these ordering is that in the graph G’ defined in Lemma 2 we can find flow of level 2, but the edges used in that flow, together with the feedback edges, form only one simple cycle instead 2.

94 94 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

a)

b)

c) Fig. 9. a) The ordering of a graph with total weight 2 of the feedback edges. b) Ordering of the same graph a) with total weight 2 of the feedback edges. c) The graph G’ defined in Lemma 2.

The last example leads to the idea of the following theorem, which gives sufficient condition whether given ordering can lead to minimal weight feedback edges set:

Theorem 3. Let G(V, E, c) be GSG and let be nodes ordering such that (vj1, vi1), (vj2, vi2), …, (vjk, vik) are all feedback edges having weights d1, d2,…, d1k. We form a flow network with source s’ and sink t’ as G’(V’={{min{vim},…, max{vjm}}, E/ V’È{s’,t’}, @/V’È{s’,t’},), where @/V’È{s’,t’} is: {c(uk,vl)| k

Proof. If the conditions of the Theorem follows, then any feedback edge is part of exactly one cycle, and for each simple cycle there is exactly one feedback edge. This means that it is impossible to remove less feedback edges from the graph and still obtain a directed acyclic graph as a result.

The proofs of the Lemmas given in this section leads to the following algorithm for reducing total feedback weight:

Algorithm 1. Step 1. Order the nodes of the graph G in topological sort corresponding to some d- MP, where d is a maximum flow of the network. Step 2. While there is a collection of feedback edges that do not satisfied the condi- tions in Lemma 2 Step 2.1. Create a graph G’ given by Lemma 2 and find the min cut (U, V).

95 95 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

Step 2.2. Swap the sets of nodes U and V to obtain new ordering of the nodes in the graph G.

It is obvious that this algorithm is much better than its competing algorithm pro- posed in [3] in terms of complexity, while we guarantee that the total weight of the feedback arcs will be at most as the one obtained in [3]. Even in the examples used in [3] to illustrate their procedure, the algorithm proposed in this research finds the op- timal ordering either after step 1 or in only one iteration in step 2, which is signifi- cantly less than what the procedure of [3] requires.

5 Conclusion and Future Work

This paper presents some theoretical results that show the connection between the ordering of nodes of a directed graph with minimal feedback weight and the flow function, for their corresponding weighted flow graphs. The necessary conditions for a node ordering to be optimal are given, and furthermore proofs of Lemmas 1 and 2 provide a procedure for reducing the weight of the feedback edges. The results lead to an algorithm for reducing feedback edges and obtaining a linearized graph. Theorem 3 gives the sufficient condition for an optimal ordering, however the procedure for obtaining one cannot be drawn in a straightforward manner. The next steps in further- ing this research will include defining a procedure that will allow for determining the optimal solution taking into account the sufficient conditions theorem, as well as con- structing a more detailed and efficient algorithm for reducing the number of feedback edges by selectively choosing which starting feedback edge subsets will be taken into consideration. The final steps will include complexity analysis of the new optimal algorithm and experimental verification for its practical usability.

References

1. Bastert, O., Matuszewski, C.: Layered drawings of digraphs. In: Drawing graphs, pp. 87– 120. Springer (2001) 2. Chen, J., Liu, Y., Lu, S., O’sullivan, B., Razgon, I.: A fixed-parameter algorithm for the directed feedback vertex set problem. Journal of the ACM (JACM) 55(5), 21 (2008) 3. Haussler, D., Smuga-Otto, M., Eizenga, J.M., Paten, B., Novak, A.M., Nikitin, S.,Zueva, M., Miagkov, D.: A flow procedure for linearization of genome sequence graphs. Journal of Computational Biology 25(7), 664–676 (2018) 4. Hecht, M.: Exact localisations of feedback sets. Theory of Computing Systems 62(5),1048–1084 (2018) 5. Karpinski, M., Schudy, W.: Faster algorithms for feedback arc set tournament, Kemeny rank aggregation and betweenness tournament. In: International Symposium on Algo- rithms and Computation. pp. 3–14. Springer (2010) 6. Karp, R.M.: Reducibility among combinatorial problems. In: Complexity of computer computations, pp. 85–103. Springer (1972)

96 96 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

7. Mihova, M., Stojkovikj, N., Jovanov, M., Stankov, E.: On maximal level minimal path vectors of a two-terminal network. International Journal of Olympiads in Informatics, 8(1), 133 – 144 (2014) 8. Mihova, M., Stojkovikj, N., Jovanov, M., Stankov, E.: Maximal level minimal pathvectors of a two-terminal undirected network. IEEE Transactions on Reliability 65(1), 282– 290 (2015) 9. Paten, B., Novak, A. and Haussler, D., 2014. Mapping to a reference genome structure. arXiv preprint arXiv:1404.5010 (2014)

97 97 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

HPM-CFSI: A High-Performance Algorithm for Mining Closed Frequent Significance Itemsets

Huan Phan1, 2 and Bac Le3

1 Division of IT, University of Social Sciences and Humanities, VNU-HCMC, Ho Chi Minh City, Vietnam [email protected] 2 Faculty of Mathematics and Computer Science, University of Science, VNU-HCMC, Ho Chi Minh City, Vietnam 3 Faculty of IT, University of Science, VNU-HCMC, Ho Chi Minh City, Vietnam [email protected]

Abstract. In the traditional closed frequent itemsets mining on transactional da- tabases, which items have no weight (equal weight, as equal to 1). However, in real world applications, usually, each item has a different weight (the im- portance/significance of each item). Therefore, we need to minine weighted/significance itemsets on transactional databases. In this paper, we propose a high-performance algorithm called HPM-CFSI for mining closed frequent significance itemsets based on approach NOT satisfy the downward closure property (a great challenge). The experimental results show that the proposed algorithms perform better than other existing algorithms on both real- life and synthetic datasets.

Keywords: Closed frequent significance itemsets High-performance HPM- CFSI algorithm Downward closure property

1 Introduction

For more than two decades, most of the research for mining frequent itemsets with the weights/significance of all items are the same (equal weight, as equal to 1), the algo- rithmic approaches based on Apriori [1]. In addition, to speed up the execution of the mining association rules, authors around the world proposed mining for closed fre- quent itemsets (CFI); CFI are non-redundant representations of all frequent itemsets, such as algorithm NOV-CFI [2], A-CLOSE [3], CHARM [4], CLOSET+[5], CFIM-P [6]. Algorithms generated candidate using a breadth-first search, with tidset format. Besides, the algorithms improve upon a depth-first search, diffset format. The main limitation of algorithms based on lattice framework, IT-Tree a major advance devel- oped using pattern-growth based on FP-Tree. In real-world applications, items can have different significance/importance in da- tabases, that such databases are called weighted databases. Most algorithms for fre- quent weighted/significance itemsets mining are based on approach satisfy the down- ward closure property such as algorithms [7-9]. However, Huai et al [10] proposed

98 98 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

algorithms Apriori-like based on approach NOT satisfy the downward closure proper- ty (very rare proposed algorithms follow this approach). This is a great challenge. In this paper, we propose a high-performance algorithm called HPM-CFSI for mining closed frequent significance itemsets (CFSI) based on approach NOT satisfy the downward closure property. The algorithms calculated in the paper are as follows: ─ Algorithm 1: Computing Kernel_COOC array of co-occurrences and occurrences of kernel item in at least one transaction; ─ Algorithm 2: Building list nLOOC_Tree based on Kernel_COOC array; ─ Algorithm 3: Generating CFSI based on list nLOOC_Tree, Kernel_COOC array; This paper is organized as follows: in section 2, we describe the basic concepts for mining closed frequent itemsets, closed frequent significance itemsets and data struc- ture for transaction databases. Some theoretical aspects of our approach relies, are given in section 3. Besides, we describe our high-performance algorithm to compute closed frequent significance itemsets on transactional databases. Details on imple- mentation and experimental tests are discussed in section 4. Finally, we conclude with a summary of our approach, perspectives and extensions of this future work.

2 Background

In this section, we represent the basic concepts for mining closed frequent itemsets and efficient data structure for transaction databases. 2.1 Mining Frequent and Closed Frequent Itemset

Let I = {i1, i2,…, im} be a set of m distinct items. A set of items X = {i1, i2,…, ik}, ij I (1 j k) is called an itemset, an itemset with � items is called a k-itemset. Ɗ be a dataset containing n transaction, a set of transaction T = {t1, t2,…, tn} and each transaction tj = {ik1, ik2,…, ikl}, ikl I (1 kl m). Definition 1. The support of an itemset X is the number of transaction in which oc- curs as a subset and divide by n-number transaction of dataset, denoted as sup(X). Definition 2. Let minsup be the threshold minimum support value specified by us- er. If sup(X) ≥ minsup, itemset X is called a frequent itemset, denoted FI is the set of all the frequent itemset. Definition 3. Itemset X is called a closed frequent itemset: If sup(X) ≥ minsup and for all itemset Y X then sup(Y) < sup(X), denoted CFI is the set of all the closed frequent itemset. See an Example transaction database � in Table 1.

Table 1. The Transaction database � used as our running Example TID Items t1 A C E F t6 E t2 A C G t7 A B C E t3 E H t8 A C D t4 A C D F G t9 A B C E G t5 A C E G 10 A C E F G

99 99 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

Table 2. FI, CFI of � with minsup =0.3 and minsup = 0.5 k- FI (minsup = 0.3) CFI (minsup = 0.3) FI (minsup = 0.5) CFI (minsup = 0.5) itemset #FI = 19 #CFI = 6 #FI = 11 #CFI = 4 1 F, G, E, A, C E G, E, A, C E FA, FC, GE, GA, GC, GA, GC, EA, EC, 2 AC AC EA, EC, AC AC FAC, GEA, GEC, 3 FAC, GAC, EAC GAC, EAC GAC, EAC GAC, EAC 4 GEAC GACE Example 1. See Table 1. There are eight different items I = {A, B, C, D, E, F, G, H} and ten transactions T = {t1, t2, t3, t4, t5, t6, t7, t8, t9, t10}. Table 2 shows the fre- quent itemsets and closed frequent itemsets at two different minsup values– 0.3 (30%) and 0.5 (50%) respectively (CFI FI). 2.2 Mining Significance Frequent Itemsets and Closed Frequent Itemset

Let I = {i1, i2,…, im} be a set of m distinct items. A set of items X = {i1, i2,…, ik}, ij I (1 j k) is called an itemset, an itemset with k items is called a k-itemset. Ɗ be a dataset containing n transaction, a set of transaction T = {t1, t2,…, tn} and each transaction tj = {ik1, ik2,…, ikl}, ikl I and a set of weight/significance SIG={sigi1, sigi2, …, sigim}, sigik [0, 1] respective to each item. Definition 4. Let X = {i1, i2,…, ik}, ij I (1 j k), significance of itemset X to compute sig(X) = max(sigi1, sigi2, …, sigik). The significance support of itemset � to compute as follow:

sigsup(X) = sig(X)sup(X) (1) Definition 5. Let minsigsup be the threshold minimum significance support value specified by user. If sigsup(X) ≥ minsigsup, itemset X is called a frequent significance itemset, denoted FSI is the set of all the frequent significance itemset.

Table 3. Items significance of � Item A B C D E F G H significance 0.55 0.70 0.50 0.65 0.40 0.60 0.30 0.80 Example 2. See Table 1, 3 and minsigsup = 0.20. Consider itemset X = {A, G}, sup(AG) = 0.50, sig(AG) = max(sigA, sigG) = 0.55, sigsup(AG) = sig(AG)sup(AG) = 0.550.50 = 0.275 ≥ minsigsup, we have itemset X = {A, G} FSI. However, Item Y ={G}, sigsup(G) = 0.300.50 = 0.15 < minsigsup, we have item G FSI (NOT satis- fy the downward closure property). Definition 6. Itemset X is called a closed frequent significance itemset: If sig- sup(X) ≥ minsigsup and for all itemset Y X then sup(Y) < sup(X), denoted CFSI is the set of all the closed frequent significance itemset.

100 100 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

Table 4. FSI, CFSI of � with minsigsup = 0.15 k-itemset FSI (#FSI = 18) CFSI (#CFSI = 9) 1 F, A, C, E, G C, E, G 2 FA, FC, AC, AG, AE, CG, CE AC, CG 3 FAC, AEG, ACE, ACG, CEG FAC, ACG, CEG 4 ACEG ACEG

2.3 Data Structure for Transaction Database The binary matrix is an efficient data structure for mining frequent itemsets [2]. The process begins with the transaction database transformed into a binary matrix BiM, in which each row corresponds to a transaction and each column corresponds to an item. Each element in the binary matrix BiM contains 1 if the item is presented in the cur- rent transaction; otherwise it contains 0.

3 The Proposed Algorithms

3.1 Generating Array Contain Co-occurrence Items of Kernel Item

In this section, we illustrate the framework of the algorithm generating co-occurrence items of items in transaction database. Definition 7. [2] Project set of item ik on database �: (ik) = {tj �| ik tj} is set of transaction contain item ik.According to Definition 1

sup(ik) = |(ik)|/| �| (2) Example 3. See Table 1. Consider item B, we detect project set of item B on data- base �: (B) = {t7, t9} then sup(B) = |(B)| = 2. Definition 8. [2] Project set of itemset X = {i1, i2,…, ik}, ij I (1 j k): (X) = (i1) (i2)… (ik). sup(X) = |(X)|/| �| (3) Example 4. See Table 1. Consider item E, we detect project set of item E on data- base �: ( E ) ={t1, t3, t5, t6, t7, t9, t10} then sup(BE) = |(BE)| = |(B) (E)| = |{t7, t9} {t1, t3, t5, t6, t7, t9, t10}| = |{t7, t9}| = 2. Definition 9. [2] (Reduce search space) Let ik I (i1 i2 … im) items are ordered in significance descending, ik is called a kernel item. Itemset Xlexcooc I is called co-occurrence items with the kernel item ik, so that satisfy (ik) (ik ij), ij Xlexcooc. Denoted as lexcooc(ik) = Xlexcooc. Definition 10.[2] (Reduce search space) Let ik I (i1 i2 … im) items are ordered in significance descending, ik is called a kernel item. Itemset Ylexlooc I is called occurrence items with item ik in as least one transaction, but not co-occurrence items, so that 1 |(ik ij)| < |(ik)|, ij Ylexlooc. Denoted as lexlooc(ik) = Ylexlooc.

101 101 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

Algorithm Generating Array of Co-occurrence Items This algorithm is generating co-occurrence items of items in transaction database and archive into the Kernel_COOC array and each element have 4 fields: ─ Kernel_COOC[k].item : kernel item k; ─ Kernel_COOC[k].sup : support of kernel item k; ─ Kernel_COOC[k].cooc : co-occurrence items with kernel item k; ─ Kernel_COOC[k].looc : occurrence items kernel item k in least one transaction.

The framework of Algorithm 1 is as follows: Algorithm 1. Generating Array of Co-occurrence Items Input : Dataset Ɗ Output : Kernel_COOC array, matrix BiM 1: foreach Kernel_COOC[k] do

2: Kernel_COOC[k].item = ik 3: Kernel_COOC[k].sup = 0 4: Kernel_COOC[k].cooc = 2m - 1 5: Kernel_COOC[k].looc = 0

6: foreach tj T do

7: foreach ik tj do 8: Kernel_COOC[k].sup ++

9: Kernel_COOC[k].cooc = Kernel_COOC[k].cooc AND vectorbit(tj)

10: Kernel_COOC[k].looc = Kernel_COOC[k].looc OR vectorbit(tj) 11: sort Kernel_COOC array in descending by significance

12: foreach ik tj do

13: Kernel_COOC[k].cooc = lexcooc(ik)

14: Kernel_COOC[k].looc = lexlooc(ik)

We illustrate Algorithm 1 on Example database in Table 1. Initialization of the Kernel_COOC array, number items in database m= 8; Item A B C D E F G H sup 0 0 0 0 0 0 0 0 cooc 11111111 11111111 11111111 11111111 11111111 11111111 11111111 11111111 looc 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000

Read once of each transaction from t1 to t10 Transaction t1 = {A, C, E, F} has vector bit representation 10101100; Item A B C D E F G H sup 1 0 1 0 1 1 0 0 cooc 10101100 11111111 10101100 11111111 10101100 10101100 11111111 11111111 looc 10101100 00000000 10101100 00000000 10101100 10101100 00000000 00000000

The same, transaction t10 = {A, C, E, F, G} has vector bit representation 10101110; Item A B C D E F G H sup 8 2 8 2 7 3 5 1 cooc 10100000 11101000 10100000 10110000 00001000 10100100 10100010 00001001 looc 11111110 11101010 11111110 10110110 11101111 10111110 11111110 00001001

102 102 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

After the processing of Algorithm 1, the Kernel_COOC array as follows Table 5:

Table 5. Kernel_COOC array are ordered in support ascending order (line 1 to 11) Item H B D F G E A C sup 0.10 0.20 0.20 0.30 0.50 0.70 0.80 0.80 cooc E A, C, E A, C A, C A, C C A looc G F, G D, E, G B, D, E, F A,B,C,F,G,H B,D,E,F,G B,D,E,F,G Execute command line 12, 13 and 14 in Algorithm 1: We added the sig field to illustrate items the order by significance descending. We have looc(G) = {B, D, E, F}, where B D F A E G, so lexlooc(A) = {E, G} and result on Table 6.

Table 6. the Kernel_COOC array are co-occurrence items ordered in significance descending.

Item H B D F A C E G sig 0.80 0.70 0.65 0.60 0.55 0.50 0.40 0.30 sup 0.10 0.20 0.20 0.30 0.80 0.80 0.70 0.50 cooc E A, C, E A, C A,C C looc G F, G E, G E, G E, G G 3.2 Generating List nLOOC-Tree In this section, we describe the algorithm generating list nLOOC-Tree based on Ker- nel_LOOC array. Each node within the nLOOC_Tree, 2 main fields:

─ nLOOC_Tree[k].item : kernel item �; ─ nLOOC_Tree[k].sup : support of item �; The framework of Algorithm 2 is as follows:

Algorithm 2. Generating list nLOOC-Tree (self-reduced search space)

Input : Kernel_COOC array, matrix BiM Output : List nLOOC-Tree 1: foreach Kernel_COOC[k] do 2: nLOOC_Tree[k].item = Kernel_COOC[k].item 3: nLOOC_Tree[k].sup = Kernel_COOC[k].sup

4: foreach ij tl do

5: foreach ij Kernel_COOC[k].looc do

6: if ij child node of nLOOC_Tree[k] then

7: Add child node � to nLOOC_Tree[k] 8: else

9: Update support of child node ij on nLOOC_Tree[k] 10: return list nLOOC_Tree

103 103 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

Fig. 1. List nLOOC-Tree based on Kernel_COOC array

Each nLOOC-Tree have the characteristics following Fig.1: ─ The height of the tree is less than or equal to the number of items that occurrence at least in one transaction with the kernel item (items are ordered in significance sup- port ascending order). ─ Single-path is an ordered pattern from the root node (kernel item) to the leaf node and the support of a pattern is the support of the leaf node (ikik+1…iℓ). ─ Sub-single-path is part of single-path from the root node to any node in an ordered pattern and the sub-single-path support is the support of the child node at the end of the sub-single-path. Example 5. Consider kernel item F, we observe nLOOC-Tree(F) generating sin- gle-path {FEG}, sup(FEG) = 0.10 and sigsup(FEG) = 0.06; sub-single-path {FE}, sup(FE) = 0.20 and sigsup(FE) = sig(FE)sup(FE) = 0.600.20 = 0.12.

3.3 Algorithm Generating All Closed Frequent Significance Itemsets

In this section, we illustrate the framework of the algorithm generating DIRECT all closed frequent significance itemsets bases on the nLOOC-Tree and Kernel_COOC. Lemma 1.(Generating DIRECT closed frequent significance itemset from co- occurrence items) ik I, if sigsup(ik) ≥ minsigsup and itemset Xlexcooc is set of for all element of lexcooc(ik) then sigsup(ik Xlexcooc) = sigsup(ik) ≥ minsigsup and itemset {ik Xlexcooc} CFSI. Proof. According to Definition 9, (2) and (3): itemset Xlexcooc is set of co- occurrence items with the kernel item ik, so that satisfy (ik) (ik ij), ij Xlexcooc and obviously (ik) (ik Xlexcooc). Therefore, we have sup(ik) = sup (ik Xlexcooc), sigsup(ik) = sigsup(ik Xlexcooc) = sig(ik Xlexcooc)sup(ik) = sig(ik)sup(ik) ≥ minsig- sup and according to Definition 10: X’ Xlexcooc, sup(X’) = sup(ik)■. Example 6. See Table 6. Consider the item F as kernel item (minsigsup = 0.15), we detect co-occurrence items with the item F as lexcooc(F) = {A, C} then sup(FAC)

104 104 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

= 0.30, sig(FAC) = sig(F) = 0.60, sigsup(FAC) = 0.600.30 = 0.18 ≥ minsigsup and itemset FAC is closed frequent significane itemset. Lemma 2. ik I, Xlexcooc = lexcooc(ik), gernrating single-paths/ sub-single-paths from nLOOC-Tree(ik) have the longest and sig(ik)sup(ikik+1…iℓ) ≥ minsigsup then {ik … iℓ Xlexcooc} CFSI. Proof. According to Definition 9, 10 and lemma 1: we have |(ik ylexlooc)| < |(ik)| |(ik Xlexcooc)| (ylexlooc contain items of single-paths/ sub-single-paths, ylexlooc lexlooc(ik) and sigsup(ik ylexlooc) = sig(ik)sup(ik ylexlooc) ≥ minsigsup. Therefore, we have sigsup(ik ylexlooc Xlexcooc) = sig(ik)sup(ik ylexlooc) = sig(ik)sup(ik ylexlooc Xlexcooc) ≥ minsigsup and {ik ylexlooc Xlexcooc} CFSI■. Example 7. See Table 6 and Fig. 1. Consider the item A as kernel item (minsigsup = 0.15), we detect co-occurrence items with item A as Xlexcooc = lexcooc(A) = {C}; we observe nLOOC-Tree(A) generating single-path {AG}, sup(AG) = 0.50, sig(AG) = sig(A) = 0.55, sigsup(AG) = 0.550.50 = 0.275 ≥ minsigsup then sigsup(ACG) = 0.275 ≥ minsigsup and itemset ACG is closed frequent itemset. Property 1. (Reduce generated space) If sup(ik-1) = sup(ik) ik lexcooc(ik-1) sig(ik-1) = sig(ik) then ik-1 ik (generating closed frequent significance itemset from item ik-1 then not consider item ik). The framework of Algorithm 3 is presented as follows:

Algorithm 3. Generating all closed frequent significance itemsets satisfy minsigsup Input : minsigsup, Kernel_COOC array, nLOOC_Tree Output : CFSI consists all closed frequent significance itemsets 1: foreach sigsup(Kernel_COOC[k].item) ≥ minsigsup do if NOT((Kernel_COOC[k-1].sup = Kernel_COOC[k].sup) 2: (Kernel_COOC[k].item Kernel_COOC[k-1].cooc) (Kernel_COOC[k-1].sig = Kernel_COOC[k].sig)) then //property 1

3: CFSI[k] = CFSI[k] {ik Kernel_COOC[k].cooc} //lemma 1 4: if (Kernel_COOC[k].looc {}) 5: SSP = GenPath (nLOOC_Tree(Kernel_COOC[k].item))

6: foreach sspj SSP do

7: if (Kernel_COOC[k].sigsup(sspj) ≥ minsigsup)//long frequent itemset

8: CFSI[k] = CFSI[k] {ik Kernel_COOC[k].cooc sspj}//lemma 2 9: return CFSI

105 105 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

3.4 The Algorithm Diagram HPM-CFSI

In this section, we represent the diagram of HPM-CFSI algorithm for high- performance mining closed frequent significance itemsets, as follows Fig. 2:

Fig. 2. The algorithm diagram for HPM-CFSI.

We illustrate Algorithm 3 on Example database in Table 1, 3 and minsigsup = 0.15. After the processing Algorithm 1 result the Kernel_COOC array in Table 6 and Algorithm 2 presented the list nLOOC_Tree in Fig.1. Line 1, consider items satisfying minsigsup as kernel items {F, A, C, E, G}; Consider kernel item F, sigsup(F) = 0.600.30 = 0.18 ≥ minsigsup (Lemma 1- line 3) generating closed frequent significance itemset with kernel item F as CFSI[F] = {(FAC; 0.18)}; (from line 4 to 8) – we observe nLOOC-Tree(F) have single-path {FEG} and {FG}: sigsup(FEG) = 0.600.10 = 0.06 < minsigsup; sigsup(FG) = 0.600.20 = 0.12 < minsigsup. Consider kernel item A, sigsup(A) = 0.550.80 = 0.44 ≥ minsigsup (Lemma 1- line 3) generating closed frequent significance itemset with kernel item A as CFSI[A] = {(AC; 0.44)}; (from line 4 to 8) – we observe nLOOC-Tree(A) have single-path {AEG} and {AG}: sigsup(AEG) = 0.550.30 = 0.165 ≥ minsigsup; sig- sup(AG) = 0.550.50 = 0.275 ≥ minsigsup generating closed frequent significance itemsets CFSI[A] = {(AC; 0.44)} {(AEG; 0.165), (AG; 0.275)}(Lemma 2). Consider kernel item C (similary kernel item A), sigsup(C) = 0.500.80 = 0.40 ≥ minsigsup (Lemma 1- line 3) generating closed frequent significance itemset with kernel item C as CFSI[C] = {(C; 0.40)}; (from line 4 to 8) – we observe nLOOC- Tree(C) have single-path {CEG} and {CG}: sigsup(CEG) = 0.500.30 = 0.15 ≥ minsigsup; sigsup(CG) = 0.500.50 = 0.25 ≥ minsigsup generating closed frequent significance itemsets CFSI[C] = {(C; 0.40)} {(CEG; 0.15), (CG; 0.25)}(Lemma 2). Consider kernel item E (similary kernel item C), sigsup(E) = 0.400.70 = 0.28 ≥ minsigsup (Lemma 1- line 3) generating closed frequent significance itemset with kernel item E as CFSI[E] = {(E; 0.28)}; (from line 4 to 8) – we observe nLOOC- Tree(E) have single-path {EG}: sigsup(EG) = 0.400.30 = 0.12 < minsigsup. Consider kernel item G (similary kernel item E), sigsup(G) = 0.300.50 = 0.15 ≥ minsigsup (Lemma 1- line 3) generating closed frequent significance itemset with kernel item G as CFSI[G] = {(G; 0.15)}.

106 106 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

Table 7 shows the closed frequent significance itemsets at minsigsup = 0.15.

Table 7. CFSI satisfy minsigsup = 0.15 (Example database in Table 1 and 3). Kernel item Closed frequent significance itemsets – CFSI (#CFSI = 9) F (FAC; 0.18) A (AC; 0.44) (ACG; 0.275) (ACEG; 0.165) C (C; 0.40) (CG; 0.25) (CEG; 0.15) E (E; 0.28) G (G; 0.15)

4 Experiments

All experiments were conducted on a PC with a Core Duo CPU T2500 2.0 GHz, 4Gb main memory, running 7 Ultimate. All codes were compiled using C#, Microsoft Visual Studio 2010, .Net Framework 4. We experimented on two instance types of datasets, see Table 8: - Two real datasets are both dense form of UCI Machine Learning Repository [http://archive.ics.uci.edu/ml] as Chess and Mushroom datasets. - Two synthetic sparse datasets are generated by software of IBM Almaden Re- search Center [http://www.almaden.ibm.com] as T10I4D100K and T40I10D100K datasets.

Table 8. Datasets description in experiments Name #Trans #Items #Avg.Length Type Density (%) Chess 3,196 75 37 Dense 49.3 Mushroom 8,142 119 23 Dense 19.3 T10I4D100K 100,000 870 10 Sparse 1.1 T40I10D100K 100,000 942 40 Sparse 4.2 Besides, we create one table to save the significance values of items by random re- al values in the range of 0 to 1. This is the first proposed algorithm for CFSI mining based on approach NOT satisfy the downward closure property. Evaluate the perfor- mance of the proposed algorithm, we modified the Charm [4] and CFIM-P [6] algo- rithms to mining closed frequent significance itemsets called the WCharm (base on IT-Tree) and WCFIM-P algorithms (base on FP-Tree). Therefore, we have compared the HPM-CFSI algorithm with two algorithms WCharm and WCFIM-P.

107 107 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

Fig. 3. Running time of the three algorithms on Chess and Mushroom datasets.

Fig. 3 (a) and (b) show the running time of the compared algorithms on real da- tasets Chess and Mushroom. The HPM-CFSI algorithm runs faster WCharm and WCFIM-P algorithms in all minimum significance supports.

Fig. 4. Running time of the three algorithms on T10I4D100K and T40I10D100K datasets.

Fig. 4 (a) and (b) show the running time of the compared algorithms on synthetic datasets T10I4KD100K and T40I10D100K. The HPM-CFSI algorithm runs faster WCharm and WCFIM-P algorithms. In the experiment, results suggest the following ordering of these algorithms as running time is concerned: HPM-CFSI runs faster algorithms WCharm and WCFIM-P in all minsigsup on real and synthetic datasets.

5 Conclusion

In this paper, we proposed a high-performance algorithm for mining closed frequent significance itemsets on transaction databases, consisting of three phases: the first phase, quickly detect a Kernel_COOC array of co-occurrences and occurrences of kernel item in at least one transaction; the second phase, we built the list of nLOOC- Tree base on the Kernel_COOC and a binary matrix of dataset (self-reduced search space); the last phase, the algorithm is proposed for fast mining CFSI base on nLOOC-Tree. Besides, when using mining CFSI with other minsigsup value then the proposed algorithm only performs mining CFSI based on the nLOOC-Tree that is calculated previously (the second phase - Algorithm 2), reducing the significant

108 108 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

processing time. The experimental results show that the proposed algorithms perform better than other existing algorithms. The results from the algorithm proposed: In the future, we will expand the algo- rithm HPM-CFSI to be able to mining closed frequent significance itemsets on Mul- ti-Cores, Many-CPUs, GPU and distributed computing systems.

Acknowledgements This work was supported by University of Social Sciences and Humanities; University of Science, VNU-HCMC, Ho Chi Minh City, Vietnam.

References 1. Agrawal, R., Imilienski, T., Swami, A.: Mining association rules between sets of large da- tabases. ACM SIGMOD IC. on Management of Data, Washington, DC, 207-216 (1993) 2. Phan H., Le B.: A Novel Algorithm for Frequent Itemsets Mining in Transactional Data- bases. In: Trends and Applications in Knowledge Discovery and Data Mining. PAKDD 2018. LNCS, vol 11154, 243–255. Springer, Cham (2018) 3. Pasquier, N., Bastide, Y., Taouil, R., Lakhal, L.: Discovering frequent closed itemsets for association rules. In 7th IC. on Database Theory. 398–416 (1999) 4. Zaki, M. J., Hsiao, C.: CHARM: An efficient algorithm for closed association rule mining. In 2nd SIAM IC. on Data Mining. 457–473 (2002) 5. Wang, J., Han, J., Pei, J.: CLOSET+: searching for the best strategies for mining frequent closed itemsets. Proceedings of the 9th ACM SIGKDD IC. on Knowledge Discovery and Data Mining; Washington, DC, USA. 236–245 (2003) 6. Nair, B., Tripathy, A.K.: Accelerating Closed Frequent Itemset Mining by Elimination of Null Transactions. Journal of Emerging Trends in Computing and Information Sciences 2(7) (2011) 7. Cai, C.H., Fu, A.W., Cheng, C.H., Kwong, W.W.: Mining association rules with weighted items. Proceedings of International Database Engineering and Applications Symposium (IDEAS 98), 68-77, (1998) 8. Tao, F., Murtagh, F., Farid, M.: Weighted association rule mining using weighted support and significance framework. SIGKDD’03, 661–666, (2003) 9. Khan, M.S., Muyeba, M., Coenen, F.: A weighted utility framework for mining association rules. Proceedings of Second UKSIM European Symposium on Computer Modeling and Simulation Second UKSIM European Symposium on Computer Modeling and Simulation, 87-92, (2008) 10. Huai , Z., Huang , M.: A weighted frequent itemsets incremental updating algorithm base on hash table. In 3rd International Conference on Communication Software and Networks (ICCSN),IEEE, 201–204 (2011)

109 109 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

Object Detection Using Synthesized Data

Matija Burić1[0000-0003-3528-7550], Goran Paulin2[0000-0002-6885-5393],

and Marina Ivašić-Kos 3[0000-0002-1940-5089]

1 Hrvatska elektroprivreda d.d., SIT Rijeka, Kumičićeva 13, Rijeka, Croatia 2 Kreativni odjel d.o.o. Rijeka, Croatia 3 Department of Informatics, University of Rijeka, Rijeka, Croatia [email protected], [email protected], [email protected]

Abstract. Successful object detection, using CNN, requires lots of well-anno- tated training data which is currently not available for action recognition in hand- ball domain. Augmenting real-world image dataset with synthesized images is not a novel ap- proach, but the effectiveness of the creation of such a dataset and the quantities of generated images required to improve the detection can be. Starting with relatively small training dataset, by combining traditional 3D mod- eling with proceduralism and optimizing generator-annotator pipeline to keep rendering and annotating time under 3 FPS, we achieved 3x better detection re- sults, using YOLO, while only tripling the training dataset.

Keywords: Object Detection ∙ Convolutional Neural Network ∙ YOLO ∙ Syn- thesized Data ∙ Sports ∙ Handball.

1 Introduction

The conventional approach of the majority of published methods for object detection using neural networks requires prepared training dataset to provide successful results. These data are usually acquired from sources similar to those on which the model will finally be applied. In case of supervised learning (Fig. 1), authors are usually limited to the publicly available labeled datasets, or are forced to obtain data on their own.

Fig. 1. The traditional supervised learning model

110 110 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

Publicly available datasets like MS COCO [1], PASCAL VOC [2], ImageNet [3], etc. provide an extensive set of images along with the way for extracting labels in the form suitable for a researcher. However, if someone needs a specific set of data, like in the case of handball sport, the possibilities are limited to a small number of images that are not labeled at all or are not labeled in the desired way – they can miss class or classes could be mixed, which can result in adjusting annotations or discarding them com- pletely. An additional challenge when preparing annotations is a possible need for ex- pert knowledge from the researched domain, which can significantly affect the cost of the whole project. Labeling of still images in object detection is usually done by draw- ing a bounding box around the desired object or by marking every pixel this object occupies in the image as shown in Fig 2. Information about every marked object in the image is saved in the individual file which minimally consists of object position in the image, object class, and an image filename.

Fig. 2. Bounding box vs. instance segmentation on players and ball objects [4]

Since providing mentioned information is rather hard for a human without graphical reference, researchers have been releasing many GUI tools which makes labeling more intuitive and speed up the process of manual annotating. Tools used in this research are LabelImg [5] which provides a bounding box for YOLO format and VGG Image An- notator (VIA) [6] which provides polygon around an object for use with instance seg- mentation, Fig. 3. Annotations can be saved in different formats but mostly accepted are XML and JSON.

Fig. 3. LabelImg [5] and VGG Image Annotator (VIA) [6] labeling tools

Due to limitations that public datasets offer, many researchers acquire datasets them- selves [7] or by a 3rd party which consumes many resources but will provide custom dataset specially designed for the project. Since the data preparation can easily exceed

111 111 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

the time needed for building a successful model, the question arises: could this be im- proved somehow? One should also take into account that the bigger dataset will result in a more successful model [8]. To increase the real data set and consequently improve the classification results, re- searchers have already tried to use synthesized images. In [9], using coarse 3D models of drones and aircrafts, with optimized rendering pipeline, authors have showed that using only 12 real images combined with 1500 or more synthetic images can signifi- cantly improve the performances of an image classifier. However, in sports scenes, we are dealing with non-rigid human bodies, their posing and actions, which requires a great diversity of synthetic images to produce a usable synthetic learning data set. Pose space coverage and texture diversity proved to be the key ingredients for the effectiveness of synthetic data training in [10]. The authors gen- erated about 5M synthetic images to train human 3D pose estimation model. For the same purpose, more than 6M synthetic images were generated in [10] for human depth and part segmentation in real images. The authors have shown that the best results are achieved when synthetic network is fine-tuned with real images. In [11] a parametric generative model of human action videos was introduced, rely- ing on physics, scene composition rules, and procedural animation techniques. Approx- imately 6M images at 3.6 FPS was generated and combined with small datasets of real images. It was showed that combining a large set of synthetic images with small real datasets can boost recognition performance. In the scope of this research, we will address the problem of detecting an athlete in handball scenes for which large datasets are not available. We will start with a relatively small dataset, and enlarge it with generated synthesized images to increase the training set needed to train the model for object detection and to evaluate its impact or contri- bution to model performance. YOLO is selected as the dedicated object detector mainly for its speed and the fact that previous researches [4, 8, 12], which are base for compar- ison, were made using the same YOLO methods. The rest of the paper is organized as follows. In Section II a synthesized data gener- ation is presented. The experiment, comparison of the performance of YOLO object detector on custom and synthesized dataset and discussion are given in Section III. The paper ends with a conclusion and the proposal for future research.

2 Synthesized data

Our task was to train the YOLO object detector on handball scenes where players and balls are objects of interest. The recordings were captured during handball exercise in the gym with artificial illumination from the top of the field, and the sunlight from the side windows. The camera position was fixed to a few different spots covering from ¼ to a ¾ of a field. The persons in scenes are mainly young players performing some handball techniques and actions such as jump-shot, dribble and defense. To generate the synthesized data, it was necessary to analyze the contents of the reference video scenes and to select the elements to be reproduced on the virtual scene.

112 112 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

The content is split into three groups: a) the environment (the sports hall), b) the player(s) and c) accessories (the ball). Since the experiment required a single 3D environment, it was created with the tra- ditional box modeling techniques, using Autodesk 3ds Max [13], and then textured us- ing both Adobe Photoshop [14], for manual texture preparation, and Allegorithmic Sub- stance Designer [15], for the procedural ones. To generate the player a procedural modeling/texturing route was chosen to enable the generation of many virtual player variations by varying body shape, sex, hair, cloth- ing, shoes, and accessories. Another challenge was preparing a virtual character for animation. Both problems were solved by using Adobe Fuse [16] and Mixamo [17], producing auto-rigged, textured 3D models, ready for animation. Not being able to acquire a library of handball player's motion capture data, the vir- tual character was manually animated in 3ds Max (Fig. 4), adding inverse kinematics on top of the Mixamo's animation rig, resulting in 123 frames of animation. Frames from the video were used as animation reference.

Fig. 4. Handmade animation in 3ds Max and virtual scene assembling in Unity

For the ball animation, a combination of static parenting of the ball in the player's hand and the physical simulation of shooting it, using Nvidia PhysX in Unity [18], were used. The virtual scene was assembled in Unity, using aforementioned 3D objects, and lit with a single area light, simulating interior lighting, and the virtual Sun, Fig. 4. The camera rig was built to allow changing the camera position and orientation from the code, reframing the scene after a single animation sequence is played. Camera rig's rotational pivot was placed in the center between the player and the goal. The camera is positioned 8 meters away from the rotational pivot, at the height of 1.5 meters, to match camera placement from the reference video. To produce 984 synthesized images, the camera rig is rotated by 45 degrees every 123 frames, rendering 8 different views. Along with beauty pass rendering (photorealistic PNG image), pixel-perfect masks, representing the player and the ball positions, were generated and stored in the red (the ball) and the green (the player) channels of a PNG image, Fig. 5.

113 113 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

Fig. 5. Rendered beauty pass (photorealistic image) and the mask (from a different frame)

Average times for rendering both photorealistic image and the accompanying mask, using SSD for storage, are 178 ms and 84 ms, respectively. After rendering, the rendered mask sequence is processed using Python, calculating bounding boxes for the green and the red channels, and writing them in XML files, using PASCAL VOC format. Average time for creating a single annotation is 40 ms.

3 Experiment

3.1 Datasets

For training models, two types of datasets were used. Images input size in both cases is 1920x1080 px (full HD) resolution. The first one (real dataset) was obtained via GoPro cameras at heights of 1.5 m and 3.5 m at the border of the field during one week in handball school without adjusting the scene to preserve real-world conditions. The videos were captured mostly in the indoor hall with artificial lighting, but part of the footage was collected outdoors during clear weather. The ball and players’ clothing are of a different color, not following two team rule. The persons in scenes are mainly young players with some spectators, and there are many ball occurrences in the single shot. Dataset has 418 images which are divided roughly at 70:30 training: test ratio. The second synthesized dataset is computer generated to reassemble real-world con- ditions as close as possible. Point of view is from different positions, but 1.5 m camera height is preserved. It consists of images of one-player-one-ball at different stages of handball jump-shot action. The player is described as a young female player, dressed in team colors, with the ball. Dataset has 984 images divided in the same ratio between train and test part as a real dataset.

3.2 Models

YOLO (You Only Look Once) [19] object detector is used for training and testing the performance of synthesized and real data. It is a specially designed neural network that first divides the input image into a grid of smaller segments, and at the same time pro- duces the probability distribution of the object classes and several candidate boundary boxes with the associated reliability for each segment. If there is no object inside the bounding box, the value of confidence will result in zero opposite to a case where

114 114 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

YOLO will take into account intersection-over-union to determine confidence level. In case when object spans over two or more segments, YOLO will elect the center cell to be the prediction candidate for an object. YOLO achieves satisfactory accuracy, alt- hough not at the level of methods such as Mask R-CNN [20, 21] – the advantage is that it is significantly faster [12].

Fig. 6. The way YOLO presents object detection data [22]

The problem that arises according to research results of [4, 12] is that pre-trained YOLO shows insufficient results of ball detection in handball sport which can be im- proved by training model on more handball sports-related data. Also, YOLO, along with other tested methods in [4] have difficulty detecting all smaller objects or objects far away from the camera. For this reason, based on research in [8] configuration for training a new model was adopted to train only two classes – player and ball object, with an input images size of 1024x1024 px. These settings lower detection speed but partly handles mentioned problem. Since computer hardware is generally improving with every passing day, it is reasonable to believe this will have little impact in the long run. For the sake of the comparison of the results of training with synthesized and real data, three models were trained. In order to optimize the training process, transfer learn- ing was performed. This way, training basic features is avoided [23]. To optimize the training process variable learning rate was applied. In the first epoch learning rate (LR) was set to a step of 0.001 and later was reduced to a step of 0.00001. If LR is too small converging to a minimum would require significant computation and, consequently, a lot of time, opposite to a too big LR where convergence is faster, but it is much harder to reach a minimum. Too high LR is likely to overshoot and too low LR is likely to get stuck at a local minimum, unable to converge to a global minimum. Optimized LR combines the speed of higher LR at the beginning and precision of a smaller LR at the later time during a process (Fig 7). Further training is improved by variable batch size where multiple samples are passed through the network at once. Since input image size is increased to 1024x1024 px, higher batch size requires a lot of memory and therefore maximum batch size ap- plied in the experiment was set to the value of 8.

115 115 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

Fig. 7. From left to right: optimized, too small and too big LR. Too small LR is slow and takes a lot of resources. Too big LR, in this case, moves further away from the minimum.

The first reference model (Y-R) was trained using real images for approximately 20 epochs and is used as the base point for comparison. The second model (Y-S) was trained using only synthesized data for 20 epochs. The third model (Y-RS) was trained with both real and synthesized data, altogether 987 images, slightly less than 20 epochs. Y-R and Y-RS models were unstable during training. This is most likely because input images contain objects that are small in size, occupying only a fraction of an image. Since the described method rescales those images during preprocessing, anno- tated bounding boxes become too small, which sometimes result in an error. Such be- havior was manifested in [7].

3.3 Environment and metric The model was trained using 12 core E5-2680v3 CPU server with one GeForce GTX TITAN X 12GB memory GPU. YOLOv2 was implemented on Debian Linux operating system with some additional programming using Python language. Testing was performed on a different server using only 12 core E5-2680v3 CPU. Training and testing speed were not significantly different from the expected train/test per image ratio. Performance is measured using Average Precision (AP) [24], and the detection for each model is compared to the ground truth considering that the Intersection Over Un- ion (IoU) should be equal to or greater than 50%. For reference, observe Fig 8.

Fig. 8. Visual description of Intersection Over Union (IoU) [8]

116 116 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

To avoid multiple detections of the same object, True Positive (TP) detection is con- sidered only if an object was not detected yet. The precision-recall curve is calculated for the person and ball individually. AP is finally computed as an occupying area un- derneath the curve. Example in Fig. 9. shows AP of a person detection using Y-RS trained on both synthesized and real datasets and tested on a real dataset.

Fig. 9. The precision-recall curve for person detection with Y-RS model where AP is occupy- ing area underneath the curve

3.4 Testing

Table 1. shows the results of tested models using mean average precision (mAP) and AP for individual class on real and synthesized test sets. The higher percentage means better results. The object of our interest was to get the best possible results of the object detector for a player and a ball in a set of real scenes. Since players resemble greatly to individ- uals in the audience, the authors decided to treat all person object detections as positives whether player or spectator is detected. Due to a variety of players clothing at this point of research, it is hard even for humans to distinguish spectator from player based solely on the appearance so it was decided to disregard FP made by spectators pretending to be players and treat all player-like objects as players.

117 117 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

The experiment has shown that the model Y-R, trained only on the real dataset, has the worst detection results (8.49% mAP) compared to other models trained in the ex- periment. Real ball objects were poorly detected compared to the average human ob- servation and there were many false positives of player objects. Training the model on synthesized data (Y-S) achieved the 41.16% mAP on a syn- thesized test set. This was expected since training samples resemble test set more closely then real datasets. Results on the synthesized test set were mainly used to vali- date efficiency and ability to perform detection in the artificial dataset. The results on a real set, which was far more interesting for our task, were significantly worse (9.76% mAP). It is important to note that the synthesized data set contained only one player handling the ball. Also, the sports jersey of the player and the color of the ball were the same in all scenes, which leads to overfitting the model and problems with real-world detection. With small variations, such as color change of the athlete's clothing, the model would probably improve. Combining the synthetic data with real data proved to be a good idea, and model Y- RS trained on both synthesized and real images achieved the 25.75% mAP which is the best detection results on a real data set. The precision of ball object more than doubled and the precision of player object roughly tripled. Considering that authors had only a limited number of real samples at disposal, mAP was significantly improved, in fact, tripled, when roughly additional 1000 synthesized images were included in the model.

Table 1. Class AP and mAP of tested models on the real and synthesized test set

Model Test set Ball AP Player AP mAP Y-R real 1.39 % 15.59 % 8.49 % Y-S synthesized 6.36 % 75.95 % 41.16 % Y-S real 1.14 % 18.37 % 9.76 % Y-RS real 3.05 % 48.41 % 25.73 %

Below are some typical examples of handball scenes and the results of the player and the ball detection for trained models. Detection results of trained models are ana- lyzed on the same image examples to enable an accurate comparison of the performance of the model. Fig. 10 shows detection results in the form of green squares around the object which describe true positive (TP), red squares describe false positive (FP) and blue describe Ground Truth (GT). Y-R handles detection of players which are not affected by occlusion with more ease than Y-S. Still, it has problems detecting small objects along with person sitting far left. When combined in Y-RS, results of detection improve slightly like in the case where a coach is detected even though it is behind one of the players, but it generates more FP like a detected ball in the right part of the image.

118 118 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

Fig. 10. Performance of Y-R (A), Y-S (B) and Y-RS (C) model top to bottom

When model Y-S is tested on synthesized dataset similar to the one it was trained on, overall results are much better as seen in Fig. 11, however, one should take into account that our goal is to train a model that will be able to detect players and balls in real handball scenes successfully.

Fig. 11. Time lapse of the Y-S detection of synthesized data

The example in Fig. 12 emphasizes the benefit of training on both training sets since the top and middle parts of the image have similar results where Y-R manages to find one player behind the glass partly obstacle and Y-S manages to find another close by. When combined, in Y-RS, both players are detected successfully plus one further back behind them. Also notable are partial detections of players behind the net. Compared to Fig. 10, in Fig. 12 action ball and static ball on the right are partly detected even though humans would have more problem distinguish action ball from the background noise in this case.

119 119 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

Fig. 12. Similar results of Y-R (top - A) and Y-S (middle - B) and improvement of detection when Y-RS (bottom – C) is used

The last example in Fig. 13 shows the improvement of both Y-S and Y-RS over Y- R method where players nearby were not detected at all or have been partly detected. Another observation is that the ball object detected using Y-S method is very similar by color and texture to the ball used in training the specific model. It can be compared to the ball object of resulting detection in Fig 11.

120 120 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

Fig. 13. Example of improved result with Y-S (A) over Y-R (B) and even more improved using Y-RS (C)

Ball far back in the right part of the image was also detected considering its size and partial visibility.

4 Conclusion

In some cases, when a set of training data is not large enough, or does not exist, the synthesized data can provide an appropriate alternative that, besides the ability to col- lect data and increase the set of images, also allows for different scenes variations with respect to camera position, framing, brightness, and the like. In this paper, we used a training set for YOLO object detector that consisted of real and synthesized handball scenes. The functionality of three YOLO models trained on real (Y-R), synthesized (Y-S), and mixed training set (Y-RS) were compared on the real test set. When only one type of data was used for training the detection model, with

121 121 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

a relatively small dataset, quite poor detection results were obtained (Y-S and Y-R model). Y-R model shows to be more successful with overlapping objects but fails to detect obvious ones, probably because its missing more training data. Y-S model man- ages to detect usual player postures and balls with distinguished edge but seems to be overfitted. It misses detection of some obvious objects. Significantly better results were achieved when both synthesized and real data were used for training, as in the Y-RS model. The objects are detected more precisely with greater IoU and object which are neither recognized with Y-R or Y-S, whether they are too small or occluded, gets de- tected. It confirms that a larger number of images in a training set will result in a more efficient object detector even when part of the set is synthesized. Also, the synthesized images used in this experiment were significantly simplified real-world simulation, with only one player, always in the same outfit, and a single ball, featuring only different camera views. That means that, at this point, less data is avail- able from ten synthesized image than from one real image containing ten players. On the other hand, this could help in distinguishing between players and audience, because players during game usually are dresses to team colors, different than spectators. It should be noted that in this research all human objects were treated as players due to fact that our real dataset consists from casually dressed players from which some are active on the field and some are static as audience waiting their turn during competition and practice. Regardless of this, real and synthetized dataset contributed to improving the detection results of the player and ball in a realistic scenario. Similar models could be applied in other sports as well with smaller adjustment. Major problems which arise in the handball sport is also present in other sports like occlusion, small/distant objects and to this point there is no universal solution. Some can be handled with even higher resolution of input images, different point of view or by use of multiple view sources but this extends out of the scope of this research. When taken into account previous observations and the fact that with slightly more effort at the beginning of building synthesized dataset one can accumulate a much large number of images for training, results are looking promising for use in future research. Additional action which could improve results and it is considered for future re- search is to apply a newer version of YOLO. The newer YOLOv3 was reported to be more accurate in detecting smaller objects which can be beneficial in handball sport detection.

Acknowledgment

This research was fully supported by the Croatian Science Foundation under the project IP-2016-06-8345 “Automatic recognition of actions and activities in multimedia con- tent from the sports domain” (RAASS) and by the University of Rijeka under the pro- ject number 18-222-1385.

122 122 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

References

1. Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft COCO: Common objects in context. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). pp. 740–755 (2014). 2. Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The pascal visual object classes (VOC) challenge. Int. J. Comput. Vis. 88, 303–338 (2010). 3. Jia Deng, Wei Dong, Socher, R., Li-Jia Li, Kai Li, Li Fei-Fei: ImageNet: A large-scale hierarchical image database. In: researchgate.net. pp. 248–255 (2009). 4. Buric, M., Pobar, M., Ivasic-kos, M.: Ball detection using Yolo and Mask R-CNN. 5th Annu. Conf. Comput. Sci. Comput. Intell. (2018). 5. GitHub LabelImg, https://github.com/tzutalin/labelImg, last accessed 2019/07/29. 6. Dutta, A., Zisserman, A.: The VIA Annotation Software for Images, Audio and Video. Proceedings of the 27th ACM International Conference on Multimedia. (2019). 7. Ivasic-Kos, M., Pobar, M.: Building a labeled dataset for recognition of handball actions using mask R-CNN and STIPS. 7th European Workshop on Visual Information Processing (EUVIP). IEEE. (2018). 8. Burić, M., Pobar, M., Ivašić-Kos, M.: Adapting YOLO Network for Ball and Player Detection. ICPRAM -. 845–851 (2019). 9. Chen, W., Wang, H., Li, Y., Su, H., Z.W.: Synthesizing training images for boosting human 3d pose estimation. Fourth International Conference on 3D Vision (3DV). (2016). 10. Souza, R., Gaidon, A., Cabon, Y., López, A.: Procedural generation of videos to train deep action recognition networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. (2017). 11. Varol, G., Romero, J.: Learning from synthetic humans. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 109-117).(2017) 12. Buric, M., Pobar, M., Ivasic-Kos, M.: Object detection in sports videos. 2018 41st Int. Conv. Inf. Commun. Technol. Electron. Microelectron. MIPRO 2018 - Proc. 1034–1039 (2018). 13. 3ds Max 2019 Help, http://help.autodesk.com/view/3DSMAX/2019/ENU, last accessed 2019/07/29. 14. Photoshop User Guide, https://helpx.adobe.com/photoshop/user-guide.html, last accessed 2019/07/29. 15. Substance Designer User Guide - Substance Designer, https://docs.substance3d.com/sddoc/substance-designer-user-guide-102400008.html, last accessed 2019/07/29. 16. Adobe Fuse CC (Beta) help topics, https://helpx.adobe.com/beta/fuse/topics.html, last accessed 2019/07/29. 17. Animate 3D characters with Mixamo, https://helpx.adobe.com/creative- cloud/help/animate-characters-mixamo.html, last accessed 2019/07/29. 18. Unity - Manual: Unity User Manual (2018.3), https://docs.unity3d.com/2018.4/Documentation/Manual/, last accessed 2019/07/29. 19. Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. Proceedings of the IEEE conference on computer vision and pattern recognition. (2017).

123 123 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

20. He, K., Gkioxari, G., Dollar, P.: Mask r-cnn. Proceedings of the IEEE international conference on computer vision. (2017). 21. Pobar, M., Ivašić-Kos, M.: Mask R-CNN and Optical Flow Based Method for Detection and Marking of Handball Actions. 11th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI). (2018). 22. AI zone, Understanding Object Detection Using YOLO, https://dzone.com/articles/understanding-object-detection-using-yolo, last accessed 2019/07/29. 23. Cook, D., Feuz, K.D., Krishnan, N.C.: Transfer learning for activity recognition: a survey. Knowl. Inf. Syst. 36, 537–556 (2013). 24. Ivasic-Kos, M., Ipsic, I., Ribarić, S.: A knowledge-based multi-layered image annotation system. Expert systems with applications 42.24. (2015).

124 124 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

Comparing Databases for Inserting and Querying Jsons for Big Data

Panche Ribarski, Bojan Ilijoski, and Biljana Tojtovska

Faculty of Computer Sciences and Engineering, Skopje, Macedonia [email protected],[email protected],biljana.tojtovska@ finki.ukim.mk

Abstract. This paper tackles the topic of performance in Big Data data inges- tion, data querying and data analytics. We test the import of the Last.fm Million Song Dataset in Cassandra, Mongo, PostgreSQL, CockroachDB, Mariadb and Elasticsearch. We also test three types of queries over the JSON documents and present the test results for unique visibility in the direct comparison of the per- formance of the selected databases. We conclude that by using a combination of state of the art scalable database, for which we recommend Cassandra, and Elasticsearch for search and analytics, we can get a unique tool for efficiently storing and querying data which can schema easily along the way.

Keywords: Big Data · search · analytics · performance · Elasticsearch · Cas- sandra · MongoDB · PostgreSQL ·MariaDB· CockroachDM · JSON.

1 Introduction

The Big Data field and Big Data Processing have attracted a lot of attention in the past years. Many researchers are choosing this field to work in, and more importantly, the industry is grasping Big Data and implementing it in everyday work. There are many Big Data oriented databases, mainly under the NoSQL name1. In fact, all the classic databases are also going after the Big Data segment with additions intheir products to match features from the newly created databases in the NoSQLfield. By using Big Data, we are now storing vast amount of data, which is getting harder and harder to understand and query. Having clusters of databases, changing schema and inserting lots of data makes it difficult to continuously get knowledge from our data and effectively query our databases. In live systems, using Big Data means that we may have to (more frequently must) change our data schema along the way. Some systems allow dynamic schema in their collections, but some systems are strict and require data transformation with any schema change. From our experience, it is always better to allow dynamic changes of the schema, without the needed impact of the schema change on the old data in the collection.

1 NoSQL - acronym for Not Only SQL

125 125 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

This paper tries to connect these three things - Big Data, effective querying, and dynamic schema. First, we test the load performance of JSON documentsinthere- lational databases MariaDB, PostgreSQL and CockroachDB, using their nativeJSON support. Then, we test the load performance of the same set of JSON documents on MongoDB and Cassandra and additionally on the hybrid database/search tool - Elas- ticsearch. The second set of tests is for the query performance of the loaded JSON documents. We prepared three queries, involving counting and filtering on fields in embedded JSON documents. Finally, we assess the ability for schema change in the databases. In Big Data, sometimes the quick ingestion times are critical, and sometimes the faster queries are more important. An important feature in some scenarios is the possibility to change schema in the working data tables. This research tries to answer the question which database to use in which scenarios.

1.1 Related work As Big Data becomes one of the top research fields, the number of papers related to the subject is growing very fast. Usually, the authors make a comparison between two data tables with similar purpose. In this section, we present a work related to our subject of research. The authors in [10] try to improve the Cassandra database performance by introducing the spatial data indexing and retrieval. They use latitude and longitude to create geohash attribute and develop new querying system for optimization of the number of queries. In [9] the authors discuss that Cassandra shows better performance compared to MongoDB. The comparison between these two NoSQL databases is made with several different types of CRUD workloads and it can be concluded that MongoDB shows better performance for small amount of data, but as the data size increases, the performance of MongoDB becomes very poor. Cassandra, however, gets faster. In [13] SQL and NoSQL databases are compared, represented by PostgreSQL and MongoDB respectively. It is concluded that in general MongoDB shows better perfor- mance. However, select operations of PostgreSQL can be improved by indexing. There are also some articles that make comparison between Elasticsearch and other databases with similar characteristics. For example in the articles [11] and [12] the authors made a comparison between CouchDB and Elasticsearch, and they understood that CouchDB performance is better in CRUD, but Elasticsearch is far better in select and search. The article [14] shows the difference in performance between Elasticsearch and Neo4j applied on 100000 test records from Twitter database. Their experiment results show that Neo4j is efficient in both storage and data retrieval, but Elasticsearch is better with increasing workloads because of high scalability. Our test will expand the number of databases in the performance analysis-wewill use three relational databases, two NoSQL databases and one hybrid database/search and analytics tool.

1.2 Paper organization The first section is the introduction to the topic of the paper and the related work. The second section presents the databases that we will test and gives background on the test JSON documents in the dataset we will use across the databases. The third section

126 126 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

sets the test scenarios and the metrics used for the tests. The results of the executed tests are presented in the forth section, and the final section concludes the topic of the paper and introduces an idea for combined database and query technologies for best performance.

2 Big Data database choices

Big Data poses big challenges to the industry. To implement Big Data means to have the capability of processing, storing and querying huge quantities of data. The classic relational databases have the capability to store such quantities of data, but their setup and maintenance can become tedious at times. The newer NoSQL databases are designed from the ground up for Big Data. They are easier to setup, and scaling up and down instances of the database is almost automatic.

2.1 JSON support in databases

JSON2 became defacto standard for storing unstructured data in a plain text format. This unstructured data can be used to represent data more complex than the standard one dimensional row in traditional relational databases. For that reason, JSONisthe most used format in Big Data databases. Also, for the same reason, the traditional databases are also implementing a way of storing JSON objects, trying to be on par with the new NoSQL technologies. In the following paragraphs we will explain the JSON capabilities of PostgreSQL, CockroachDB, MariaDB, Mongo and Cassandra. PostgreSQL supports native JSON format as a field type since version 9.2. The definition of this format is as simple as defining the field type as json, for example the CREATE statement for a table orders with ID as primary key and info as json stringis "CREATE TABLE orders (ID serial NOT NULL PRIMARY KEY, info json NOT NULL) ;". Inserting JSONs is also simple, you just insert the json string as you would insert any other field type in an INSERT statement, for example "INSERT INTO orders (info) VALUES (’ "customer": "John Doe", "items": "product": "Beer","qty": 6’);". The querying of this JSON field can be done in several ways. The regular "SELECT info FROM orders;" will return the full JSON strings in the table. By using the "->" key- word, for example "SELECT info -> ’customer’ AS customer FROM orders;" we get all the customers from the JSONs in the table in a JSON format. With the keyword "->>", for example "SELECT info ->> ’customer’ AS customer FROM orders;" we get all the customers as text. In PostgreSQl, we can use WHERE clause on internal fields from the JSON string, for example the query "SELECT info ->> ’customer’ AS customer FROM orders WHERE info -> ’items’ ->>’product’ = ’Diaper’;" will answer who bought ”Diaper”. Aggregations are also supported, as well as some more advanced JSON operators [8]. CockroachDB implements the PostgreSQL interface completely. This means that we can use the same drivers and the same queries on CockroachDB as we do with PostgreSQL.

2 JSON - acronym for JavaScript Object Notation

127 127 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

MariaDB also supports JSON internally, starting from version 10.2.7. Internally, MariaDB uses LONGTEXT as data type, but they use the alias JSON for compatibility reasons with MySQL. MariaDB claims that the performance of their LONGTEXT data type for supporting JSON is at least as good as the native JSON of MySQL. MariaDB also provides native JSON functions that help with inserting and querying data [3]. MongoDB is a database designed for storing documents from the ground up. For this, it natively supports JSON as a primary data format in their database engine. Here, the notion of table as in traditional databases is called collection. MongoDB col- lection is the first level of logical organization for JSON strings, which in MongoDB is called documents. Also, by default, MongoDB supports flexible schema - documents in one collection do not necessarily need to conform to the same specification. Mon- goDB supports relations between documents in the same or different collections. This removes the need for duplication of referenced JSON data, which will directly lead to smaller databases for the same datasets. MongoDB provides drivers for all popular programming languages [7]. Collections can be created in two ways, implicitly by di- rectly adding a document in it with "db.orders.insert("customer": "John Doe", "items": "product": "Beer", "qty": 6)", or explicitly by calling the createCollection function "db.createCollection("orders")".Theexplicitway supports adding additional advanced options to the collection being created [5]. In- serts are done by calling the insertOne or insertMany functions, and queries are done by calling the find function which supports filtering and embedded/nested search [6]. MongoDB supports aggregations on the data natively, providing real-time computed results [4]. Cassandra supports JSON natively since version 2.2 [1]. But, this native support of JSON just extends the table schema, allowing us to insert a JSON conforming to specific table schema. The only advantage of using JSON with Cassandra is that there is no need to parse a JSON in the client in order to insert in Cassandra - the database will do it by matching the keys in a JSON with the table structure. The rest of the process of working, the JSON data is the same as the classic table operations in Cassandra.

2.2 Target scenario Our target scenario is a very specific use case where we want to store vast amounts of data - hence the Big Data tag, and we want to index and effectively query that data. We will select a dataset represented with JSON documents and we will test the dataset on PostgreSQL, CockroachDB, MariadDB, Mongo and Cassandra. The test will include inserts and queries of the data. Additionally, for the query part, we will index everything in Elasticsearch and measure the queries against the queries done in the databases mentioned above. The results of the tests will give us a deeper comparative knowledge about what database and query engine to choose when using Big Data.

2.3 Test dataset The dataset [2] we are using for this scenario is last.fm dataset. The data set con- tains 839,122 json files and has 943,347 matched tracks between MSD and Last.fm 3, 3 This is the test dataset from last.fm

128 128 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

{ "artist": "Casual", "timestamp": "2011-08-02 20:13:25.674526", "similars": [ {"key": "TRABACN128F425B784", "value": 0.871737}, {"key": "TRIAINV12903CB4943", "value": 0.751301}, ], "tags": [ {"key": "Bay Area", "value": "100"}, {"key": "hieroglyiphics", "value": "100"} ], "track_id": "TRAAAAW128F429D538", "title": "I Didn't Mean To" }

Listing 1: JSON file example

505,216 tracks with at least one tag, 584,897 tracks with at least one similar track, 522,366, unique tags, 8,598,630 track - tag pairs, 56,506,688 track - similar track pairs. An example of one file is showed below. There are six different fields intheJSONfiles. The first one is artist, which represents the artist that performs the song. Then, times- tamp which presents the date and time when the file was created (added to database), track id, the unique identifier of the song in the last.fm database, title is the title of the song, similars contains the list of track id, value pairs which represent the precomputed similarity measure between this song and other songs, and the last fields is tags, which is also a list of key, value pairs of song - level tags.

3 Test scenarios and metrics

Our test scenario involves the test dataset from last.fm which contains 839122 JSON documents, all together 5 gigabytes of text data. The first step is to import the JSON documents in the databases, which is repeated 5 times to verify consistency. For easy maintenance of the test environment, we use official Docker images for the databases. For each database, we create a fresh Docker container, and import the JSON dataset into the container of the database. It is worth noting that we use default configuration parameters for each database. Before each import, we drop the tables and createthem again. After each import, we run the three test queries over the imported data, also 5 times each query for consistency verification. The test was executed on Intel Core i7-8550U with four cores, 32GB RAM and M.2 2280 NVMe storage. The operating system running Docker was Arch Linux with 4.19.36 kernel version. The Docker version was 18.09.5-ce, and the docker-compose version was 1.23.2. The Python scripts were executed with Python 3.7.3 .

129 129 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

4 Results

In this section we will present the results from the testing of the databases. The average load time for all databases is presented at the figure 1. We insert the whole dataset into the databases several times and compute the average load time in seconds. As we can see there are three different clusters if grouped by load time. In the first one, there are Mongo and Cassandra databases that have significantly better time compared to the others. CocroachDB and Elasticsearch are in the second one and they need very long time to load the whole dataset. MariaDB and Postres are in the third one and their time is somewhere in the middle.

Fig. 1. Load time in seconds

At the Figure 2 the average time of the execution of each query is shown 2. The MariaDB is certainly slowest in performing all of the queries. Postgres keeps an average between 15 and 25 seconds for all of the queries with the best result in the first one and the worst in the second query. Mongo shows pretty good results in the second and especially in the third query, but in the first one is worst then CockroachDB and Postgres. CockroachDB on the other hand, shows good results in the first two queries and very bad result in the last one. It is interesting that Mongo has the worst performance in the first query, Postgres in the second and CockroachDB in the last one. The absolute winner in this measurement is Elasticsearch. It has by far better times in all three queries then the others. If we count an average for executions of the all three queries, then the average time for MariaDB is 121s, Mongo 12s, CockroachDB 27s, Postres 20s and Elascitsearch just 0.35s. As you can see we didn’t measure times

130 130 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

on Cassandra database because it is not possible to perform this kind of queries on that database.

Fig. 2. ime of execution of queries

5 Conclusion

The results show clearly that the NoSQL databases are built for faster manipulation with unstructured data, such as the JSON documents in our tests. The import time of the test dataset for Mongo is 521 seconds and Cassandra is 261 seconds. The import time for Postgres is 8016 seconds, MariaDB is 8441 seconds and CockroachDB is 18023 seconds - these times are 30-70 times slower than the Cassandra import and 15-35 times slower than the Mongo import. In the test with the three queries, Mongo leads with the fastest timings, except for the first query which should probably be optimized in a more Mongo native manner. It is worth noting that Cassandra couldn’t execute the queries because it doesn’t support native JSON functionality. Elasticsearch, the hybrid database in which search and analytics are first class citizens, is a special case which should be discussed separately. There are many dis- cussions whether Elasticsearch can be used as a database, be it SQL or NoSQL, (see https://www.elastic.co/blog/found-elasticsearch-as-nosql). The joint opinion between the experts is that Elasticsearch should not be used as a primary database, mostly because of the lack of elegant handling of OutOfMemory exceptions. There is also the lack of transactions and the lack of any authentication and authorization mechanisms,

131 131 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

at least in the open source version of Elasticsearch. However, if we pair this excellent search engine and indexer with another excellent database which supports easy clus- tering for storing huge data, we would have a Big Data dream team. Such a technology would be Cassandra - a Big Data database store which can manage massive amounts of data, supports fault tolerance by replicating data and is scalable in thenumber thousands of nodes in one cluster. Apache Cassandra is a proven technology in the industry, used by Apple, Netflix, CERN, Reddit, Instagram and many more big and small companies which leverage Big Data. The best field in which Apache Cassandra shines is easy storage and usage of petabytes of data across a cluster of nodes installed on commodity hardware. Our idea for pairing Cassandra and Elasticsearch means that we would use Cassandra for storing JSON documents in a purely string manner, and at the same time, index the JSON document in Elasticsearch for search and analyt- ics. The string representation of the JSON documents in Cassandra lead to flexible schema - Cssandra would not care if we changed the JSON documents. However, since Elasticsearch needs schema, we would create a new index for each change of the JSON schema, and if all documents are needed in the same index, then reindex the old schema documents with a possible transformation in the pipeline.

References

1. Cassandra json support. https://www.datastax.com/dev/blog/ whats-new-in-cassandra-2-2-json-support,accessed:2019-07-22 2. The last.fm dataset. http://millionsongdataset.com/lastfm/,accessed:2019-07-22 3. Mariadb json function. https://mariadb.com/kb/en/library/json-functions/,ac- cessed: 2019-07-22 4. Mongo - aggregation. https://docs.mongodb.com/manual/aggregation/,accessed:2019- 07-22 5. Mongo - create database collection. https://docs.mongodb.com/manual/reference/ method/db.createCollection/,accessed:2019-07-22 6. Mongo - crud query documents. https://docs.mongodb.com/manual/tutorial/ query-documents/,accessed:2019-07-22 7. Mongodb drivers and odm. https://docs.mongodb.com/ecosystem/drivers/,accessed: 2019-07-22 8. Postgresql json functions and operators. https://www.postgresql.org/docs/current/ functions-json.html,accessed:2019-07-22 9. Abramova, V., Bernardino, J.: Nosql databases: Mongodb vs cassandra. In: Proceedings of the International C* Conference on Computer Science and Software Engineering. pp. 14–22. C3S2E ’13, ACM, New York, NY, USA (2013), http://doi.acm.org/10.1145/ 2494444.2494447 10. Ben Brahim, M., Drira, W., Filali, F., Hamdi, N.: Spatial data extension for cassandra nosql database. Journal of Big Data 3(1), 11 (Jun 2016), https://doi.org/10.1186/ s40537-016-0045-4 11. Gupta, S., Rani, R.: A comparative study of elasticsearch and couchdb document ori- ented databases. In: 2016 International Conference on Inventive Computation Technolo- gies (ICICT). vol. 1, pp. 1–4 (Aug 2016) 12. Gupta Sheffi, R.R.: Data transformation and query analysis of elasticsearch and couchdb document oriented databases http://tudr.thapar.edu:8080/jspui/handle/10266/4215

132 132 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

13. Jung, M., Youn, S., Bae, J., Choi, Y.: A study on data input and output performance com- parison of mongodb and postgresql in the big data environment. In: 2015 8th International Conference on Database Theory and Application (DTA). pp. 14–17 (Nov 2015) 14. Zhu, J., Tirumala, S.S., Anjan Babu, G.: A technical evaluation of neo4j and elasticsearch for mining twitter data. In: Singh, M., Gupta, P.K., Tyagi, V., Flusser, J., Oren,¨ T. (eds.) Advances in Computing and Data Sciences. pp. 359–369. Springer Singapore, Singapore (2018)

133 133

Second Sheld-on conference meeting Solutions for ageing well at home, in the community and at work

COST Action CA16226 Indoor living space improvement: Smart Habitat for the Elderly

October 17, 2019 Ohrid (Republic of North Macedonia)

Scientific Committee

Aleksandra Popovska Mitrovikj – North Macedonia

Petre Lameski – North Macedonia

Ivan Chorbev – North Macedonia

Eftim Zdravevski – North Macedonia

Rafael Maestre – Spain

Jake Kaner – United Kingdom

Birgitta Langhammer - Norway

Michal Issacson – Israel

Nuno Garcia – Portugal

Georgios Ntalos – Greece

Carlos Valderrama – Belgium

Ciprian Dobre - Romania

Michael Burnard – Slovenia

Francisco Melero – Spain

Telepresence mobile robot platform for elderly care

1 1 Natasa Koceska , Saso Koceski

1 Faculty of Computer Science, University Goce Delcev - Stip, Stip, R. North Macedonia [email protected], [email protected]

Keywords: elderly care, assistive robotics, telepresence robot

137

Recent demographic trends in most countries show that the percentage of elderly is rapidly in- creasing. According to the World Health Organization (WHO), by 2050, the number of people in the world who are aged over 60, is projected to grow more than double, from 605 million to 2 billion, representing 16% of the world’s population.

These trends are introducing multiple challenges, including social, economic and cultural, not only to the elderly but, also for their families and the entire society. On the other hand, lack of qualified health care personnel, and the high level of expectancy of these services that should meet the demands for the elderly in order to prolong their independency, have alre- ady introduced increased burden in the health-care sector in many countries.

Ageing society is demanding technological solu- tions that will help to overcome these issues. Robotic assistants used in the elderly and the disabled care has emerged as a potential solu- tion. Various personal service robots are already available on the market. Many of them such as: PEARL, CompanionAble, CareBot, Kompai, Care- o-Bot, PR2 (Sharkey and Sharkey, 2011) are desig- ned to perform specific tasks including: manipu- lating objects, reminder for taking medications, maintaining a shopping list, as well as emergency notification, and navigation. In addition to the high cost, these robots have their own limitations, such as the absence of both social and healthcare ca- pabilities, inability to pick-up objects from high shelves, as well as problems with thresholds, du- ring indoor navigation (Koceska et al. 2019).

Having in mind these limitations of the existing so-

lutions, we have designed and developed a low- Figure 1: Developed assistive telepresence cost assistive telepresence robot system for robot for elderly health care. facilitating the health care of the elderly, and im- proving the quality of life of the elderly and han- dicapped (Figure 1). The developed robot was composed of four main functional hardware units: wheeled robot body capable of steer-steering,

138 robot body containing linear actuator, robot arm with 6 degrees of freedom (DOF), and robot head represented by a tablet and camera for remote communication.

Along with the smart navigation functionalities, the robot permits various interactions in a remo- te environment, like navigation, fetch and carry small objects, measuring vital parameters of an elderly person, reminder, calendar, and interpersonal communication. The potential users of the robot system are not only the elderly but, also professional caregivers. The robot can be remotely controlled by a distant person (e.g., a professional caregiver), and can perform some activities as if he/she was physically present at the elderly’s residence (Koceska et al. 2019).

Robot’s control architecture, various command interfaces (Koceski et al. 2012), (Koceski and Ko- ceska 2010) as well as robot functionalities have been experimentally evaluated. Conducted evaluation studies demonstrated that the core functionalities provided by developed tele- presence robot system are accepted by potential users (Koceski and Koceska 2016).

References

• Koceska, N., Koceski, S., Beomonte Zobel, P., Trajkovik, V. and Garcia, N., 2019. A Telemedicine Robot System for Assisted and Independent Living. Sensors, 19(4), p.834. • Koceski, S. and Koceska, N., 2010, June. Vision-based gesture recognition for human- computer interaction and mobile robot’s freight ramp control. In Proceedings of the ITI 2010, 32nd International Conference on Information Technology Interfaces (pp. 289-294). IEEE. • Koceski, S., Koceska, N. and Kocev, I., 2012. Design and evaluation of cell phone pointing interface for robot control. International Journal of Advanced Robotic Systems, 9(4), p.135. • Koceski, S. and Koceska, N., 2016. Evaluation of an assistive telepresence robot for elderly healthcare. Journal of medical systems, 40(5), p.121. • Sharkey, A. and Sharkey, N., 2011. Children, the elderly, and interactive robots. IEEE Robotics & Automation Magazine, 18(1), pp.32-38.

139 BIM & AAL

Volker Koch, Maria Victoria Gómez Gómez, Petra von Both

Karlsruhe, Germany, Karlsruhe Institute of Technology (KIT) [email protected], [email protected],[email protected]

Keywords: building information modeling BIM, ambient assistant living AAL

140

Building Information Modeling (BIM): the design and operation of buildings is today characteri- zed by the concept of building information modeling in the professional world. The term BIM describes a planning method which, ideally, can digitally model, link and display all relevant building information over the entire lifecycle of a building. The actors involved, their respon- sibilities and their interaction are understood as an essential part of the processes modeled in BIM (Eastman 2011) . The term Ambient Assisted Living (AAL), on the other hand, summari- zes methods, systems and services designed to enable people with reduced mobility to live as long as possible and independently in their entrusted environment. They are supported as discreetly as possible by electronic systems integrated in the living environment and con- cepts based on them (Rashidi, 2013). In recent years, the BIM method has proven to be a promising method for the efficient planning, achievement and operation of buildings. It can be seen as a consistent continuation of the digitalization of house planning and benefits from earlier developments in the automotive industry. Buildings, however, differ in some essen- tial respects from the manufacture of other industrial products. They are always unique, are characterised by a very high number of players involved, have an exceptionally long service life as products and must be able to adapt constantly to the changing requirements of their users during this time. It is the interaction with users in particular that presents buildings with constantly growing challenges. In particular, rapidly changing technical infrastructure deve- lopments (e.g. Smart House) and changing user profiles due to social changes play a decisi- ve role here. In this context, demographic development is crucial: more and more people are expressing the desire to be able to remain as self-determined as possible in their old age in their familiar living spaces and to be able to lead as independent a life as possible even with limited mobility and motoric functions.

These considerations give rise to three central questions for the training of construction experts: what requirements arise from these processes for future buildings? Why can the principles of the BIM and AAL methods support this process well? How should a didactic system be structured that can effectively support these requirements?

What are the requirements for the construction processes and for the buildings themselves?

• Buildings must be able to react flexibly and effectively to social and technical changes. The possibility of adaptation must primarily be shown in the flexible design of floor plans and result in individually designable and scalable spaces. • In the future, buildings must be able to interact with their users automatically at a high level. This requires a sophisticated system of sensors and actors. • Automation must be configurable and responsive to the individual needs of each user. The BIM method offers best conditions for supporting an interdisciplinary and heterogeneous working group in the planning of a complex unique product. It builds a consistent and com- prehensive digital data model and ideally manages all relevant semantic information about materials, procedures, processes and actors involved, in addition to the geometric-spatial information. The data basis can be kept in open source formats and made accessible to the actors on a rule-based basis. The requirements of Ambient Assisted Living concepts can be integrated both technically and methodically into BIM.

Buildings that have to meet these requirements have particularly high demands on the inte- raction of planners and their training. This training can only succeed if it has the following properties: • The accumulation of theoretical knowledge alone is not sufficient to meet the interdisciplinary requirements of these new planning tasks. • The planners must take on different roles as planners and stakeholders in

141 group work and train their ability to act in an interdisciplinary environment. • The training must initiate a significant change in the way the participants work. Open cooperation and joint action should replace the solitary way of working that is common today. • The cooperation must take place already in early planning phases, since the possibilities for influencing the construction processes are greatest. • Due to the rapid development of technological issues, experts must be able to quickly acquire new knowledge and be ready for lifelong learning and learning on demand concepts. The combination of BIM and AAL methods is currently being set up to an open learning plat- form in the EU-funded Erasmus+ project ESSENSE (Essense 2019).

References

• Eastman C., Teicholz P., Sacks R., Liston K. BIM Handbook: A Guide to Building Information Modeling for Owners, Managers, Designers, Engineers and Contractors – Wiley, 2011 • Rashidi P., Mihailidis A., A Survey on Ambient-Assisted Living Tools for Older Adults,” in IEEE Journal of Biomedical and Health Informatics, vol. 17, no. 3, pp. 579-590, 2013 • Essense - Education Supporting Smart Environments for Seniors: , retrieved on 21.08.2019

Acknowledgments

The project ESSENSE is co-funded by the Erasmus+ Programme of the European Union (2018-1-DE01-KA203-004292)

142 Interactive textiles for designing adaptive space for elderly

Amine Haj Taieb, Amal Lajmi

ISAMS Tunisia LGTex [email protected], [email protected]

Keywords: interactive textiles, space design, elderly, comfort

143

Interdisciplinary research between interior design, textile materials and health care staff, is focused on the different functional qualities and aesthetics of an interior space with interactive textiles, such as the conception of furniture adapted physically and psychologically for elderly by taking into account some criteria and components like accessibility, sound regulation, music therapy, luminopharia, comfort, thermal, usability, .. In this the study we will present some initiatives recently developed of smart textile for elderly, we will determine the potential of interactive textiles used for designing multi-sensory environments and their effects on the treatment and therapy of the dementia such as Alzheimer.

144 Solutions in built environment to prevent social isolation

Veronika Kotradyova

Faculty of Architecture, Slovak University of Technology in Bratislava, BCDlab, [email protected]

Keywords: social isolation, built environment, common space, social innovation, space arrangement

145

Built environment influences human behaviour, well-being and health in the long-term interac- tion. Space arrangement, furnishing and program of space can help to prevent or to diminish behaviour and phenomena in society with negative impact on public health, including social exclusion and isolation, one of the most crucial problem of the elderly.

Older adults had in the past stable and respected role in the family, living in bigger multigene- rational families, neighbourhoods and communities, based on unity, solidarity and belonging – related to traditional socio-cultural values. By increasing individualism, modern nomadism of productive population and by practicing self- optimization and hedonism in societies with advances economies, traditional family systems are endangered.

In the modern societies it is hard to get back to the full traditional understanding of multigeneratio- nal housing, but it is possible to create housing concepts for co-living of more generations – by conscious social and material environmental de- sign that encourages a different range of beha- viour leading to socialization, combined with suffi- cient privacy, very much depending on social and political system, cultural differences and personal economic situation. Social and physical inclusion like the prevention of social exclusion and loneli- ness can lead to the reduction of health problems that often have mental /psychosomatic back- ground.

It can be directly or indirectly supported by built en-

vironment. On macro-environment solution it can be found in providing diversity of multipurpose buildings with mixed programmes that are get- ting together people of different age and occu- Figure 1: Mixed program for public building-kin- pation and multigenerational housing concepts dergarten and day care for the elderly - St. as well, where switching between being at home Vincent in Seattle. and at attractive and assessable public space in a flexible way is possible, and ration of privacy and socialization through natural meetings and ga- thering is adaptable.

146

On microscale sociopetal space arrangement – supporting socialization and communication, with possibilities of getting together can have positive impact on inclusion. To share and support each other on everyday basis in common shared spaces that are easily available. Ex- change of knowledge, insights, energy flow between older adults and children/youth is a contribution for both sides. Creating and providing attractive and accessible common shared housing and public and semipublic space around housing or day care units. Without fences or other visual and physical barriers. By contemporary trend of supporting concepts of joined smaller housing units with possibility to have home day care, it is crucial to have not only shared space to meet and socialize naturally by passing through and getting to some point of everyday use, but also private room with size that accommodates also visitors where inti- macy is a must but in the same time it supports socialization and communication, including meeting family that can feel comfortable and welcome during their visit. An important issue is to have good access to sanitary equipment- safe and welcoming shower, to feel clean and to have self-confidence by socializing.

The space concepts are more successful in pre- vention of social isolation if they are supported by appropriate programmes – based on neigh- bourhood community, professional day care and social workers and volunteers, e.g. programme of getting seniors out for a ride on special bikes adapted for pushing a wheel chairs in Bratislava in Slovakia etc. The most important issue here is the adaptability – flexibility in housing concepts is needed thus the society and life stories of its members are in steady change. At the same time it is also necessary to count with obstacles like stereotypes and fear from change.

Social isolation can arise not only by staying at home,

but also in the institutional care, while the closer and fruitful inter-human relationships are possible rather in smaller groups.

According to the environmental psychologist Ro- Figure 1: “Stacionár” - Day care centre directly bert Gifford (1996), place attachment represents in the facilities of grammar school enables a deep experience of feeling part of a place. It contact with children in the village Hrušov in is related to the richness of meaning and sen- south-central Slovakia, where seniors spend se that is developed out of acquaintance with a their day time with handworks and crafts, pro- place and, subsequently, when the place gets to ducing different small products also for sale. be more familiar. This attachment can be to our homes, properties, communities or local nature sceneries and settings. Where the attachment ri-

147 ses, the intensity and meaning of the place and the meaning of Self become affiliated. Then the meaning of the place can become so strong that self-identity starts to be restricted by the place. On a smaller scale, many people are identified with their neighbourhood, quarter, village, farms, house and rooms and being separated from this bound can lead to social ex- clusion. At the same time coming into a new social group can be a challenge and stimulation for further personal development.

To demonstrate these ideas, the paper presents studies from the field of social architecture and design. Many organizations dealing with this kind of projects are gathered in institution DESIS network - Design for social innovation and sustainability that provides platform of social innovation projects in architecture and design. There are presented projects and sha- red experiences of authors and coordinators. One interesting project CULBURB- Cultural acupuncture treatment for suburb and also projects presented at Biennale of Architecture in Venice in 2018.This kind of approach is significantly supported also by study programs of design and architecture schools such as at the Institute without boundaries. Rural studio, School of Visual Arts, New York and their MFA Design for Social Innovation, where available experience-based programs are dealing with social design in praxis.

References

• Čerešňová, Zuzana. Interakcia človeka s prostredím. In ALFA. Roč. 22, č. 3-4 (2017), s. 4-17. ISSN 1135-2679 • Kotradyová, V. (2015) Komfort v mikroprostredí /Comfort in microenvironment, Premedia, Bratislava • Gifford, R. (1996) Environmental psychology. Allyn and Bacon

Acknowledgments

This work was supported by project APVV 16-0567 Identity SK – common platform for design, architecture and social sciences and COST project COST action project CA16226 SHELDON—Smart habitat for elderly

148 Smart furniture definition based on key technologies specification in context of literature and patent analysis

Ondrej Krejcar1, Petra Maresova1, Francisco José Melero2,3, 4 5 1 Sabina Barakovic , Jasmina Barakovic Husic , Kamil Kuca

1 Faculty of Informatics and Management, University of Hradec Kralove, Hradec Kralove, Czech Republic 2 Technical Research Centre of Furniture and Wood of the Region of Murcia, C/Perales S/N, 30510 Yecla, Spain 3 Technical University of Cartagena, Telecommunication Networks Engineering Group, Cartagena 30202, Spain 4 Faculty of Transport and Communications, University of Sarajevo, Sarajevo, Bosnia and Herzegovina 5 Faculty of Electrical Engineering, University of Sarajevo, Sarajevo, Bosnia and Herzegovina

[email protected], [email protected], [email protected], [email protected], jasmina_barakovic yahoo.com, [email protected]

Keywords: smart furniture, key technologies, definition, systematic literature and patent analysis

149

Smart Furniture is a very used term recently, which is related to current trends such as digi- tization, smart city or internet of things. Within these phenomena, Smart Furniture is used in different contexts, and so its concept is not clarified. The aim of the article is to show the context of Smart Furniture definition based on the key technologies related to up to date literature and patent analysis. The definition of Smart Furniture is based on previous research (Krejcar et al. 2019) which was undertaken based on searching in scientific and patent data- bases. Thus, the term is defined by its technical properties and parameters. Authors (Krejcar et al. 2019) resulted with definitions of “Smart Furniture” based on deep analysis of selected articles and patent activity: “Smart Furniture is designed, networked furniture that is equipped with an intelligent system or controller operated with the user’s data and energy sources. Smart Furniture is able to communicate and anticipate the user’s needs using a plurality of sensors and actuators inside the user’s environment, resulting in a form of user-adapted furniture or an environment that satisfies the user-declared needs and non-declared needs for the purpose of improving their quality of life in a smart world.” (Krejcar et al 2019)

This definition is put into the context of actual trends of articles and patents content with se- lected future trends. Following article will deal with currents trends in literature and patents, which can be seen with the help of new analytical SW solutions.

Smart City and Smart Home are now the current trends in the research, industry and also mar- ket, with connection to fast growing Industry 4.0 and IoT trends. Since these Smart City, Smart Home and other components are already well established and defined, Smart Furniture is not such clearly described and defined as it is multi scope topic which covering furniture industry, design trends, and electronics and ICT on the other hand (Fig. 1 Left).

A B E G

F Sensor dat a signal Sensor data sign al

Patient sleep user User data information

Station portion rotor

User data information C D H Y

Figure 1: Left: Smart Furniture as a component of Smart City umbrella (modified from Krejcar et al 2019). Right - Code Map (CPC) for patent pool of “Smart Furniture” phrase in Title, Abstract, Claims, and Description. Methods: A literature search was undertaken between 20 August 2019 and 04 September 2019, while the Web of Science database was included, which was searched by keywords that included the phrase “Smart Furniture”. Patent searching was performed in the PatentInspira- tion database. In total 45 articles from scientific database and 146 patent applications were examined against strict criteria containing meaningful definitions of Smart Furniture.

Based on the analysis of key technologies and properties, clustering of results and their further analysis, it was found that the concept of smart furniture is specific to the following compo- nents: intelligent system, controller operated with user’s data and energy sources, sensors and actuators.

Analysis is based on the all years searching for strict phrase “Smart Furniture” in TOPIC, which cove Title, Abstract and Keywords in ISI WOK database and Title, Abstract, Claims and Des-

150 cription in PatentInspiration database. Records from ISI WOK are available between years 2010 and 2019 in total of 45 records. Patents search in patentsInspiration with exact phrase “Smart Furniture” in Title, Abstract, Claims or Description. The result is 146 patents application from year 2002 (Fig. 3). 141 records is available with abstract, while 61 od all is granted.

Literature search can be analyzed using clustering, where we are using VOS Viewer SW solu- tion, which provide visually indicative clusters. Another possible view is based on keywords analysis and parameters, which is common for both articles and patents. Patens activity can be further analyzed using special codes (IPC and CPC) which indicate area of technology used inside patents text, description or covered by patents claims.

Patent analysis: One of the most important information in filled patent application is CPC (Coo- perative Patent Classification) Code, which is assigned by patent authority after initial scree- ning. The CPC scheme is accompanied by a set of CPC Definitions, which are documents, which explain how to use the CPC scheme for classifying and searching a specific technolo- gy. These CPC codes provide relevant information, where is the target area of patent appli- cations in the patent pool (result of patent search) – in our case for 146 patent applications.

CPC Code map show a graphical of similarity between patents and codes since the human mind is used to and can readily understand the use of maps to correlate distance between two items. This analysis indicates which CPC codes are used in our patent pool (Fig. 1 Right). The most used CPC codes in our pool are represented by CPC Codes also varies in time, where for last 4 years the most patented topics covers Speech recognition (G10L15/00), followed by Input arrangements for transferring data (G06F3/00) and Information retrieval (G06F16/00).

Based on the pool analysis, one of the most important result can be seen at evolutionary po- tential analysis, where properties has been extracted and described the most relevant in the patent text and also which properties have not yet been explored. This is possible because every property is backed up with a list of synonyms (Fig. 2 Left).

132

Figure 2: Left - Evolutionary Potential analysis for patent pool of “Smart Furniture” phrase in Title, Abstract, Claims, and Description for all years (2002-2019). Right - Cluster analysis for publication records of “Smart Furniture” phrase in Title, Abstract and Keywords for all years (2010-2019).

ISI Web of Science: Results from database search provide 45 records from authors covering affiliations from countries like USA (7x), Japan (6x), Germany (5x), Greece and Romania (4x), Czech Republic, Italy and Lithuania (3x). There records are mostly in Computer Science and Electrical engineering.

Analysis in figure (Fig. 2 Right) provide a result from cluster analysis based on 45 ISI WOK re- cords with keywords. There is a strong connection from Smart Furniture and Smart Citi and Smart Home topics. Next very intensive connection is for Market Research and Furniture Industry. From analysis is also evident that it is well surrounded by ICT technologies like: IoT,

151 Bluetooth, Capacitive Proximity Sensors and Interactive Tabletops. Articles are also well con- nected to environments like Active Assisted Living, Adaptive Environments.

Discussion and Conclusion: Based on the specification of key technologies, clustering of re- sults and their further analysis, it was found that the concept of Smart Furniture is specific to the following components: intelligent system, controller operated with user’s data and ener- gy sources, sensors and actuators. Also the current trends and technologies are showed as well as areas where the space for innovation is available.

References

• O. Krejcar et al., “Smart Furniture as a Component of a Smart City—Definition Based on Key Technologies Specification,” in IEEE Access, vol. 7, pp. 94822-94839, 2019. doi: 10.1109/ACCESS.2019.2927778

152 A short history of IoT architectures through real AAL Cases

Pedro C. López-García, Andrés L. Bleda-Tomás, Miguel Ángel Beteta-Medina, Jose Arturo Vidal-Poveda, Francisco J. Melero-Muñoz, Rafael Maestre-Ferriz

Electronics and Home Automation Department - Technological Research Centre of Furniture and Wood (CETEM) - C/Perales s/n, Yecla - 30510, Spain [email protected] , [email protected], [email protected], [email protected], [email protected], [email protected]

Keywords: AAL, AmI, IoT architectures, WSNs, eHealth

153

Summary: Advanced economies will have to deal with the assistance needs of their growing aging population, and technological solutions such as IoT (Internet of Things) for AAL (Active and Assisted Living) can address this challenge efficiently. This paper presents a history of IoT architectures in AAL applications through the lens of CETEM’s real experiences in selected R&D projects, which evolved in parallel to technological solutions and approaches. Manu- facturing companies, customers’ needs and end-users have been drivers in this process. The paper presents the different architectural approaches with its main pros and cons, ho- ping to serve as a reference for future designs.

1. Introduction

During the last decades ICTs (Information and Communication Technologies) have permeated most of our daily lives, becoming an integral part of our lifestyle, but the real benefits to our societies is sometimes questionable. Care of older adults is one of the areas that can clearly benefit from AAL technologies. On the other hand, our economies need to leverage the use of AAL technologies to address the exponential growth of healthcare costs that comes with our aging population.

“IoT” is a relatively new term that implies the interconnection via the Internet of any kind of ob- jects, which need to have or integrate computing and communication capabilities. IoT has successfully been applied to AAL, and this paper provides a short description of the evolu- tion of IoT projects carried out in CETEM´s history. It´s worth noting that the latest solutions are the best fit, and that some value may remain in older approaches.

2. WSNs with local server

Around a decade ago, the concept of WSNs Hardware Layer Middle level software layer (Wireless Sensor Networks) was everywhe- re. It was used in the project called “Smart Sensory Furniture” (SSF) (2011-2012), where traditional furniture was augmented by the Management of inconsistences

integration of sensor nodes (aka motes),

Management of Dynamics of the which included different sensors (Botia User Modelel Ontology

2012), some modest computing and con- Open Context Platforn

trol capabilities, wireless communications Adapters (Zigbee) and, in some cases actuators (e.g. adjustable beds or lighting). Figure 1: SSF architecture

154 The motes were arranged as a 2.4 GHz Zigbee mesh with a coordinator node, which was locally connected to the “AmI station”, a small PC-based Linux machine. Data was analyzed auto- matically by the AmI engine so as to find patterns of normal behavior. If current behavior di- ffers significantly, anomalies or unusual situations were detected, and a remote server in some kind of call center was notified, where a person should take action. See (Bleda 2017) for more detailed information.

The previous approach required a dedicated Hardware Layer Middle level software layer User Interface network to be deployed, set up and main- tained, and it implied an external service provider, similar to a call center. Therefore, the obvious evolution for a residence-like

environment was developed in the “AmI- Care” project, which used the omnipresent

Base Data WiFi, and a cheap Raspberry pi 3 (Bleda 2018). The nodes can be distributed an- ywhere with WiFi connection. Any device with a browser can connect to the server Figure 2: AmICare architecture

for information and notifications.

3. Single device system thanks to the cloud

SSF and AmICare approaches require to

configure a local network and server. In

the “Asistae” project (2016) we aimed at Integrated device Cloud Device with sensors minimizing the complications at the end- user side by providing a single plug-and- play device that included an active GSM

connection. First, the device connects to a cloud server. The user only needs to install a smartphone app and configure it with information and preferences. The complexity is moved to the hardware and Figure 3: Device and cloud cloud sides, and it can be updated to new

155 requirements anytime. The architecture is made of three parts: integrated sensors, a mobile app and a cloud server. See for product info .

Please, note that a hybrid architecture between section 2 and 3 can be obtained by moving the local servers of section 2 to the cloud.

4. Conclusions

This paper presents the evolution of key IoT projects carried out by CETEM during the current decade within the home habitat sector. The first example presented was designed for a scalable set of independent homes with a dedicated Zigbee network, whereas the other for a residence-like environment and using the existing WiFi. Then, a single device solution is described, ideal for making independent smart furniture pieces. Finally, a hybrid solution is proposed. This work could be quick reference of examples for guiding future designs.

References

• Bleda, A. L., Fernández-Luque, F. J., Rosa, A., Zapata, J., & Maestre, R. (2017). Smart sensory furniture based on WSN for ambient assisted living. IEEE Sensors Journal, 17(17), 5626-5636. • Bleda, A. L., Maestre, R., Beteta, M. A., & Vidal, J. A. (2018, October). AmICare: Ambient Intelligent and Assistive System for Caregivers Support. In 2018 IEEE 16th International Conference on Embedded and Ubiquitous Computing (EUC) (pp. 66-73). IEEE. • Botia, Juan A., Ana Villa, & Jose Palma. (2012) “Ambient assisted living system for in-home monitoring of healthy independent elders.” Expert Systems with Applications 39.9, 8136-8148.

156 Resource Allocation using network Centrality in the AAL environment

1 2 3 Katerina Papanikolaou , Constandinos Mavromoustakis , Ciprian Dobre

1 European University Cyprus, Nicosia, Cyprus, Department of Computer Science and Engineering, 2 University of Nicosia, Nicosia, Cyprus, Department of Computer Science, 3 University Politehnica of Bucharest, Splaiul Independentei 313, Bucharest, Romania

[email protected], [email protected], [email protected]

Keywords: Content distribution in AAL, network centrality, social centrality

157

As an ever increasing number of Internet users are becoming members of one or even many social networks, we are investigating how this social proximity can be used to benefit online applications and particularly Ambient Assisted Living Applications with their particular requi- rements.

The concept of social centrality was developed by researchers, who sought insight as to how social power is acquired within a group. The concept embodies the notion that the user be- ing in the middle of user pairs, forms an indispensable part of the network.

The centrality measure is used to identify the most important node within a graph or network. The process of identifying an important node within a graph or network can have multiple applications including depending on how the term of importance is defined. In networks such as the Internet, it is also used to help identify key spreaders of content or key reposito- ries of content. In more specialized networks such as Ambient Assisted Living (AAL) networ- ks these key spreaders are even more profound. Additionally, these nodes can be used to identify key paths and routes for maintaining network connectivity under varying conditions. Centrality can used either as an indicator of importance or as a measure for the potential influence this node many have on its peers, the two differing in functionality.

The Ambient Assisted Living environment is viewed as an area where users often have one common characteristic, such as age and many belong to multiple social networks that are closely connected. As AAL applications are envisioned to carry important data for their users optimal and resilient placement is required.

We are presenting here preliminary work on how we can improve data caching in multiple socially formed networks under the AAL environment, by using the concept of centrality, in order to improve data access speeds, availability and survivability.

Socially aware resource allocation strategy

The concept of centrality comes from graph theory and translated to social networking it is used to identify nodes with certain importance within a social network. Different types of centrality have been identified over the years, all based on network and path metrics among network nodes. One of these definitions is that of betweenness centrality. Betweenness cen- trality is a measure of centrality in graph theory based on shortest paths and was as propo- sed by Linton Freeman in 1977 (Freeman 1977). In our work we extend this concept to parallel betweenness centrality.

158 Influential Nodes Detection

Here we describe the implementation of pa- rallel betweenness centrality in an Infor- mation Centric Network (ICN) where AAL applications and data are distributed. The aim is to reduce and localize the size of

the network by parallelizing the running of the algorithm that identifies the influen-

tial bridge nodes. Fig 1.

Our proposed algorithm, uses the parallel betweenness social centrality scheme An ego-network with labelled circles for identifying the bridge nodes in the (McAuley and Leskovec 2012).

159 network. The proposed solution pseudocode is shown below. Here we specify a graph G(V, E) that represents the network topology. The nodes with the highest parallel betweenness centrality, are designated by the ParrallelBC function are denoted G_s^o, we then use the variable Ccurrent to hold the node selected to participate in the simulation.

Conclusions

Our results demonstrate that using social centrality caching schemes for data storage and es- pecially Parallel Betweenness Centrality we can significantly reduce the distance of data to users. This will also provide a better experience to the end user since the closer the content to the requester the easier the access to this content becomes.

References

• Azimdoost B., Westphal C., Sadjadpour H. R. 2016. Fundamental limits on through-put capacity in information-centric networks, IEEE Transactions on Communications, 64, 12: 5037–5049. • Bastug E., Bennis M., Kountouris M., Debbah M. 2015. Cache-enabled small cell networks: Modeling and tradeoffs, EURASIP Journal on Wireless Communications and Networking, 1, 41. • Bharath B., Nagananda K., Poor H. V. 2016. A learning-based approach to caching in heterogeneous small cell networks, IEEE Transactions on Communications, 64,4: 1674–1686. • Bernardini C., Silverston T., Festor O. 2014. Socially-aware caching strategy for content centric networking, In Proceedings 2014 IFIP Networking Conference (IFIP ‘14), IEEE, 1-9. • Castells M. 2004. Informationalism, networks, and the network society: A theoretical blueprint, The network society: A cross-cultural perspective, 3–45, Cheltenham and Northampon, MA, USA. • Cross R., Borgatti S. P., Parker A., 2001. Beyond answers: Dimensions of the advice network, Social networks, 23, 3: 215–235. • Freeman L., 1977. A set of measures of centrality based on betweenness, Sociometry. 40 (1): 35–41. • Hamidouche K., Saad W., Debbah M. 2014. Many-to-many matching games for proactive social- caching in wireless small cell networks, In Proceedings 12th International Symposium on Modeling and Optimization in Mobile, Ad Hoc, and Wireless Networks (WiOpt ’14), IEEE, 569–574. • Leskovec J., Horvitz E. 2008. Planetary-scale views on a large instant-messaging network, In Proceedings of the 17th international conference on World Wide Web, ACM, New York, NY, 915–924. • Li J., Chen H., Chen Y., Lin Z., Vucetic B., Hanzo L. 2016. Pricing and resource allocation via game theory for a small-cell video caching system, IEEE Journal on Selected Areas in Communications, 34(8), 2115–2129. • Mangili M., Martignon F., Paris S., Capone A. 2017. Bandwidth and cache leasing in wireless information- centric networks: A game-theoretic study, IEEE Transactions on Vehicular Technology, 66, 1: 679–695. • McAuley J., Leskovec J. 2012. Learning to Discover Social Circles in Ego Networks. In Proceedings of Advances in Neural Information Processing Systems 25 (NIPS ’12). • Newman M. E. 2004. Analysis of weighted networks, Physical review, 70, 5: 56-131. • Pantazopoulos P., Stavrakakis I., Passarella A., Conti M. 2010. Efficient social-aware content placement in opportunistic networks, In Proceedings Seventh International Conference on Wireless On-demand Network Systems and Services (WONS ‘10), IEEE, 17–24. • Scott J., 2017. Social network analysis, Sage Publications. • Sourlas V., Gkatzikis L., Flegkas L., Tassiulas L. 2013. Distributed cache management in information- centric networks, IEEE Transactions on Network and Service Management, 10(3), 286–299. • Travers J. Milgram S. 1967. The small world problem. Phycology Today,1, 1: 61– 67. • Wang Y., Li Z., Tyson G., Uhlig S., Xie G. 2016. Design and evaluation of the optimal cache allocation for content-centric networking, IEEE Transactions on Computers, 65(1), 95–107. • Yu Y.-T., Bronzino F., Fan R., Westphal C., Gerla M. 2015. Congestion-aware edge caching for adaptive video streaming in information-centric networks, In Proceedings 12th Annual Consumer Communications and Networking Conference (CCNC ’15), IEEE, 588–596.

160 Framework for the development of mobile AAL applications

1 2 3 Katerina Papanikolaou , Constandinos Mavromoustakis , Ciprian Dobre

1 European University Cyprus, Nicosia, Cyprus, Department of Computer Science and Engineering 2 University of Nicosia, Nicosia, Cyprus, Department of Computer Science 3 University Politehnica of Bucharest, Splaiul Independentei 313, Bucharest, Romania

[email protected], [email protected], [email protected]

Keywords: User experience, adaptive interface design for AAL, usability for mobile applications

161

Introduction

Software development for Ambient Assisted Living applications has special requirements (Alkhomsan et al. 2017). Here we identify and propose the essential elements for the develo- pment of user centred AAL mobile applications, that not only comply with the basic usability requirements but are also so engineered that their use by the elderly is easy and enjoyable. Most software development tools are neutral to the age of their users especially those for the mobile apps environment where special limitations apply (Pal, Funolkul, Charoenkitkarn, Kanthamanon 2018), thus resulting is software that is difficult to use thus frustrating its users. In this part of our work we identify the major components that should be included in the de- velopment of AAL mobile application software.

Simplicity

The application must be very simple to use since the user is probably novice and inexperienced with technology.

Interface appearance

Colour and contrast are an important requirement to have in such an application where a large portion of users have issues with vision. Having no contrast may result in missing information in the user interface that the eye might not catch. The best practice for having a good inter- face is by having high contrast between the components of every page.

Like colours and contrast, the fonts are also required for this system. The fonts additionally aid and help on how good and easy the design looks and it can make or break an application’s interface based on the first look or impression the users get.

Regarding the size of the text, since the elderly may have visual impairments such as glaucoma or presbyopia or might wear glasses it should fluctuate in higher size level. .

Now regarding the buttons, since they are very important for the functionality and the overall way the interface looks. We need to take into account in the button size too in correlation to the state of the user’s vision. .

As for spacing, it is also required like the size of buttons and text to aid to the easiness of rea- dability factor of the application. Something that looks good, is spaced in such a way that is easy to look at and is efficient to use with the minimum effort.

Interaction

Interaction is important to any system but especially to an application targeting the elderly. As the elderly don’t have a good or enough knowledge or all things technology or using an application for that matter, giving back feedback is a key. It is important in order to give answers to frequent questions about the functionality of the project (Pal, Funilkul, Papastra- torn, 2018).

Audio Feedback

As a different kind of feedback, an audio feedback can be helpful to user with visual impair-

162 ments. Some of the best practices have audio feedback included by using a voice message for every action on the user interface and other have one voice message on every page ex- plaining by hinting on what the user should do next and how to do a certain action.

Summary

In summary, the framework proposed for developing AAL mobile applications targeting the elderly population focuses on simplicity, reliability and ease of use. Elderly users, have spe- cific requirements that should not be overlooked even though this might limit software func- tionality. But it is very important for the development of such an application to remember that simplicity is a key requirement and also that the users targeted may suffer from severe impairments and have little knowledge of technology. It is therefore very important that the basic principles of this framework are followed for successful results.

References

• Alkhomsan M. N., Hossain M. A., Rahman S. M., Masud M. 2017. Situation Awareness in Ambient Assisted Living for Smart Healthcare, IEEE Access, 5: 20716–20725 • Pal D., Funolkul S., Charoenkitkarn N., Kanthamanon P. 2018. Internet-of-Things and Smart Homes for Elderly Healthcare: An End User Perspective, IEEE Access, vol.6: 10484–10496 • Pal D., Funilkul S., Vanijja V., Papastratorn B. 2018. Analyzing the Elderly Users’ Adoption of Smart-Home Services, IEEE Access, 6: 51238 – 51252

163 The use of Technology to promote activity and communication by the municipality services, examples from an urban area

Birgitta Langhammer, Ainta Woll , Jeppe Skjønberg

1 Oslo Metropolitan University 2 University of Oslo 3 Baerum Kommune

[email protected], [email protected], [email protected]

Keywords: welfare technology, activity, home dwelling elderly

164

Background

The growing elderly population challenges the current organization of elderly care services and calls for new ways of delivering care work in a more cost-efficient manner (Directorate of Health 2013). The public health authorities stress that the amount of formal and institutio- nalized care services cannot maintain over time without compromising the welfare society. Thus, the elderly care arrangements must transform into more sustainable delivery of servi- ces involving to a larger degree work such as self-care and informal care. To accomplish this shift in elderly care work, the governments have developed several strategies including ac- tive aging, new types of housing supporting independent living, increased use of home-ba- sed care services and expanded use of assistive technologies.

To enable elderly people to live independently in their homes, the government aims to de-ins- titutionalize elderly care services by upscaling home care services and care housing and downscaling long-term stays at nursing homes. Increasing use of assistive technologies will play a significant role in the ongoing transformation of care services, however our empirical data shows how difficult appropriation and use of technology are for elderly end-users.

Most elderly people want to live indepen-

dently in their homes for as long as pos-

sible, and the majority of elderly people

(74%) are actually doing so by the support

they get from their family, see Figure 1.

Over the two last decades, housing-orien- ted care such as assistive housing has

become an international trend (Daatland and Otnes 2014) but is still viewed as con- troversial in Norway. Assistive housing in- cludes nursing homes and care homes, Figure: 1. Housing arrangement for ederly norwegians as whereas the governments’ aim is to ups- from 67 years of age (statistic norway 2013) cale care homes, downscale long-term

165 stays in nursing homes, and to develop more extensive and robust home care services to provide increased home-based services (Daatland and Otnes 2014).

The increased use of technology-supported care also plays a significant role in the transforma- tion of elderly care services. However, in Norway, the use of assistive technology has been relative limited in scale and has been mostly concerned the use of a personal safety alarm.

The notion of assistive technology, which in Norway is termed “welfare technology”, is an um- brella term for user-oriented technologies aiming to provide or assist users with public or private welfare services. Welfare technology is defined in a wide sense as technologies that supports the citizen to participate in the society. These services can be technical support of the users’ safety and security, daily life activities, self-reliance, and participation in activities (Ministry of Education and Research 2011, 2013).

Welfare technology is divided into four main categories of technology support:

1. safety and security, e.g., the safety alarm service, 2. compensation and wellness, e.g., memory support, walker, light/heating control, 3. social contact, e.g., video communication, and 4. care and treatment, e.g., blood glucose meter, blood pressure monitor We find it useful to distinguish between active and passive use of technologies, pointing to the fact that technologies can be designed with various levels of automation in order to support a diversity of users with different functional disabilities ranging from minor to severe disabi- lities (see e.g., Ministry of Education and Research 2013). We argue elsewhere that also fully automated technology can give users full autonomy, and there is no one-to-one relationship between the level of automation and the user’s autonomy (Woll and Bratteteig 2018).

Conscious and active use of technology refers to situations when a person interacts with te- chnology for a purpose, like doing a blood glucose check or pushing the personal safety alarm button when in need for assistance. This requires that the user is familiar with the te- chnology’s function and knows how to perform the necessary steps and accurately operate it. Active use is different from passive use of technology, where the technology does not require any conscious input by the user, e.g., an alarm is triggered automatically when pre- set conditions are met. For example, a fall sensor can alert people for assistance on behalf of the user when detecting a user on the floor. In such cases the user interacts with the tech- nology without being aware of its interaction mechanism. Other examples of passive use of technology include shower nozzles / water faucets that provide constant temperature in the bathing / shower water, infrared water tap flow control, or automatic refrigerator door closer to mention a few (Mao et al. 2015).

Passive use of technology is particularly interesting for users who cannot be expected to un- derstand abstract or symbolic representations (Woll and Bratteteig 2018). For example, nur- sing homes with old residents (“the oldest old”) often have a high prevalence of dementia, suggesting a higher level of automation and passive use for the technology to support the users. Passive use of technology is not merely for residents in nursing homes. Most elderly people suffering from dementia live in private homes or care housing. Thus, implementation of various sensor technologies can support them by increasing their safety and security with respect to preventing fire, preventing or alerting falls, or alerting if they wander outdoors during night.

166

End of the life care Nursing home Fragile, feeling of unsafety, bedridden, in need of in-patiente care

Personal grooming and care

Care+ Home Medical assistance

Practical assistance and up keeping

Everyday / domestic assistance

Assistive technology Private home (Owned or rented housing) Simplifying tools and arragements

Trajectory initiation Accident/illness/aging

Everyday life

Figure 2: An illustration of the care staircase compared to our technology-supported elderly care trajectory. The blue gradient color in the trajectory illustrates the amount of technology use.

Figure 3 (below) illustrates how technology can enhance the self-care, informal care and formal care so that the time within each phase is potentially prolonged for the elderly persons tra- versing the trajectory. The level of support can be designed as layers of alternative solutions for a specific service, which then better can match altering user needs and fit a person’s ca- pacities and general condition. One example is the daily activity of shopping groceries, which is in the pre-trajectory phase (this example could also be illustrated in the care housing pha- se). Many elderly people with mobility or balance disorders use a walker for support when walking and going shopping. Several walkers are equipped with a basket for storage, e.g., to carry home groceries. If an elderly person does not master to physically go to the shop, s/ he can shop groceries online and get the groceries delivered on the door. Moreover, if sho- pping groceries online is found difficult or impossible because of poor general condition, the person can get support from volunteer helpers or family members. In Figure 7, we illustrate some examples of layers of services connected to the housing situation of an elderly

Figure 3: The figure illustrates layers of increased support for self-care, i nformal care and formal care work within distinct phases of the trajectory.

167

A challenge with the approach presented in Figure 7 is to find solutions that fit individual needs. First and foremost, the services need to match a user’s actual needs and capacities. There is not one universal solution that fits all users. The formal care workers need to evaluate each user according to their actual needs. Types of interventions should be discussed and agreed upon in collaboration with the formal care workers e.g., home care nurses as they have the responsibility for formal services included in the trajectory. Thus, even if formal care workers delegate work responsibility to the self-care worker or informal care worker it is still the for- mal care worker, who has the main responsibility (and needs to be the back-up worker).

We argue that there is a need for several types of technologies for the same service in order to find a solution that fits a user’s preferences. For example, a specific design of a digital medici- ne dispenser is not necessarily intuitive for all users, so distinct types need to be introduced if one type is found hard to learn.

The technology-supported services for critical services such as camera and sensors for digital supervision must work 24/7. If they do not work as planned or fail to alert when critical inci- dences take place, the users health and well-being can be at risk. This require self-testing solutions that alerts in cases when equipment is not online. This also calls for a technical staff supporting care workers to maintain equipment and make sure that the technology-suppor- ted services are robust.

In this presentation we will show and discuss examples from two municipalities in Norway how welfare technology and other strategies in the local surrounding are introduced to meet these challenges.

References

• Directorate of Health (2013). Welfare technology. Technical report on the implementation of welfare technology in municipal healthcare: 2013–2030. Oslo: Norwegian Directorate of Health. Google Scholar • Mao, Hui-Fen; Ling-Hui Chang; Grace Yao; Wan-Yin Chen; and Wen-Hi W. Huang (2015). Indicators of perceived useful dementia care assistive technology: Caregivers’ perspectives. Geriatrics & Gerontology International, vol. 15, pp. 1049–1057.Google Scholar • Ministry of Education and Research (2011). Innovasjon i omsorg. NOU 2011:11. Ministry of Health and Care Services. Oslo: Ministry of Health and Care services. . Accessed 12 October 2018. • Ministry of Education and Research (2013). Morgendagens omsorg. Meld. St. 29 (2012–2013). Ministry of Health and Care Services. Oslo: Ministry of Health and Care services . Accessed 12 October 2018. • NAKU (2017). The care staircase . Accessed 5 April 2017. • Statistic Norway (2013). Elderly people use of healthcare services . Accessed 7 November 2015. • Woll, Anita (2017). Use of welfare technology in elderly care. Ph.D. dissertation. University of Oslo. Oslo: Department of informatics, University of Oslo . • Woll, Anita; and Tone Bratteteig (2018). Activity Theory as a Framework to Analyze Technology- Mediated Elderly Care. Mind, Culture and Activity, vol. 25, no.1, pp. 6–21.Google Scholar .

168 Thermal environment assessment and adaptative model analysis in elderly centers in the Atlantic climate

Pedro Torres1, Joana Madureira1, Lívia Aguiar2, Cristiana Pereira1,2, Armanda Teixeira-Gomes1, Maria Paula Neves2, João Paulo Teixeira1,2, 1,2 Ana Mendes

1 EPIUnit - Instituto de Saúde Pública, Universidade do Porto, Portugal 2 Environmental Health Department, National Health Institute Dr. Ricardo Jorge, Portugal [email protected], [email protected]. pt, [email protected], [email protected];, [email protected], [email protected]. pt, [email protected], [email protected]

Keywords: Elderly, thermal comfort, elderly centers, adaptative model, atlantic climate

169

This study is focused on assessing occupant’ thermal comfort (TC) in 8 elderly centers located in the Metropolitan Area of Porto, a representative zone of the Atlantic Climate in Portugal. The determination of TC parameters involves (i) design of spaces; (ii) well-being of users; and (iii) energy efficiency of building facilities (heating and cooling systems). The development of TC models is preferably required in order to predict the thermal physiology and comfort of this susceptible population. The measurement campaign takes place from February to December 2019, and data collection is conducted 3 times per season in each participant institution, allowing a year-round analysis. For the purposes of this study, the indoor ther- mal environmental parameters (air temperature, air velocity, relative humidity and mean ra- diant temperature), the outdoor conditions (temperature and relative humidity), as well as, the physical activity (met) and clothing level (cl), are being analyzed in 22 living and activity rooms of the selected elderly centers. This allows to calculate PMV (Predicted Mean Vote) and PPD (Predicted Percentage Dissatisfied) indices and subsequently, evaluate TC. Moreo- ver, a detailed building characterization has been performed by a walk-through survey inclu- ding information such as type of building construction (concrete, masonry); thermal isolation of the building envelope (type of windows and doors, the presence of weather stripping); ventilation system (natural, mechanical, hybrid) and practices (opened windows and doors); types of indoor materials; use of heating appliances; evidence of dampness or mold; (ope- ned windows and doors); etc.

The thermal sensation is also being estimated by interviewing the residents during each TC data collection, determining thermal sensation votes (TSV) as well as preference and accep- tability levels through the ASHRAE scales. By now there are 160 questionnaires answered, expecting to reach up to 600 questionnaires total by December. This ongoing study is now on summer season monitoring, having already collected the spring and winter seasons data, both for the TC assessment and TSV survey. By way of example, preliminary PMV and PPD results of winter and spring season are shown in Table 1. Considering the recommended range of PMV values for optimum thermal comfort (between -0.2 and +0.2), it is verified that P07 is the only to present the PMV value (winter season) in accordance with this range of values. The upcoming data analysis will consider the application of environmental comfort models considering a longitudinal multi-criteria analysis (as seen in Fig. 1). This approach of adaptative TC models for older people also offers the possibility of improving the resident’s life quality and saving buildings energy at the same time.

Winter Season Spring Season

Institutions (code) PMV PPD [%] PMV PPD [%]

P01 - - 0,23 11,0 P02 -0,44 9,68 0,13 6,5 P03 0,63 13,66 0,62 15,8 P04 0,49 11,46 0,63 15,1 P05 0,25 8,25 0,45 14,0 P06 -0,79 10,34 0,44 11,2 P07 0,05 8,47 0,51 11,9 P08 - - 0,43 11,2

Table 1: Average preliminary results per institution of PMV and PPD indices

170

Air temperature

Radiant temperature

Air humidity

Air velocity

Thermal insulation (Clothing)

Metabolic Rate (Activity)

Figure 2: An illustration of the care staircase compared to our technology-supported elderly care trajectory. The blue gradient color in the trajectory illustrates the amount of technology use.

Acknowledgments

This research was funded by the Announcement of Selection of Multidisciplinary Research on Aging: ‘Fondo Europeo de Desarrollo Regional en el marco del Programa de Cooperación Interreg V-A España – Portugal, (POCTEP) 2014-2020, Expediente: 6/2018_CIE_6’, under the Coordinated Program ‘ConTerMa- Análisis del confort térmico en residencias de ancianos en el espacio de cooperación transfronterizo de España-Portugal’. This work is a contribution to COST Action CA16226 - Indoor living space improvement: Smart Habitat for the Elderly. The work developed by ATG and JM is supported by the Portuguese Foundation for Science and Technology (FCT) under the grants SFRH/BD/121802/2016 and SFRH/BD/115112/2016, respectively.

171 Climate change in the city and the elderlies

Emanuele Naboni

KADK, The Royal Danish Academy. Institute of Architectural Technology, Denmark [email protected]

Keywords: climate change, urban heat island, pollution, urban design, elderly

172

As the urban population increases and ages, issues related to climate change are tangible. Cli- mate change summed to urban heat island affects cities climatic patterns, with the manifes- tation of extreme and more frequent weather events, including heatwaves. Climate change in dense cities place stress on everyone, but disproportionately on the vulnerable popu- lations, including the old and those with chronic illness. Older people are the most at risk because of decreased mobility, changes in physiology, and their limited adaptive capacity. The short paper presents a discursive overview of both the issues and the possible tools to foster urban design strategies that cope with these.

In the near future, climate change will have an even greater impact in urban environments, leading to environmental stresses on urban infrastructure, especially if the business is con- tinued as usual. This condition is exacerbated in urban areas where spaces are warmer than surrounding rural areas. The densification of cities and the use of artificial materials has in- deed led to higher temperatures. Concrete, asphalt and steel reduce the city albedo (the thermal reflectivity of surfaces), and the absorbed heat is retained in place by thermal mass. The subsequent installation of indoor air conditioning systems increases the overall urban temperature (heat is extracted by indoor spaces and emitted in the street). These adverse circular effects determine the so-called Urban Heat Island (UHI).

The effects of UHI and high temperature in cites are variable, depending on pre-exposure heal- th status, psychological well-being, and social characteristics. The oldest old are more likely to suffer negative health effects from climate change because of physical decline or frailty. Those individuals who suffer from multiple pre-existing chronic conditions and those who take medications that increase susceptibility to heat are more at risk because of those fac- tors. Underlying chronic medical conditions (for example, cardiovascular disease, obesity) exacerbate susceptibility, as do medications that affect the body’s thermoregulatory capa- city. In sum, research on heat waves suggests that interventions to mitigate climate change, the urban heat island should be beneficial to elderly. As such, both tools and countermeasu- res to UHI effects have to be sought and implemented as soon as possible.

It is known that design choices, such as the form and the materials of buildings and open spa- ces, alter local thermodynamic phenomena, which in turn influence outdoor thermal com- fort. The outdoor thermal comfort, as opposed to the indoor one, is ever-changing, with wide spatial and temporal variability due to weather changes. A set of strategies that modify outdoor thermal comfort and improve the thermal environmental quality of cities are today available. Outdoor comfort can be either digitally modelled or measured on-site. In the last decade, the scientific community has become increasingly interested in the topic, encoding a few modelling tools to support its simulation. Outdoor comfort simulation tools and their capabilities have been assessed in previous research by the author, revealing that ENVI-met, CitySim and Ladybug Tools are software tools of interest to designers to define strategies to mitigate UHI. But when designers are working on the modelling of the outdoor thermal comfort of a specific location, it is essential to understand how elderly, are currently expe- riencing it. The understanding of human physiological adaptation to the environment could be modelled, but providing onsite information via local measurements, wearable sensors and comfort surveys is a piece of valid information.

Among the full range of design solutions for mitigating heatwaves and UHI, re-greening is the most viable. The increasing reduction of green areas in the city has massively decreased the environmental quality of cities. This discourages elderly from spending time outside. Gree- ning of cities is a key strategy from many sides. It improves elderly psychological wellbeing;

173 it positively influences air quality via CO2 mitigation and increases outdoor thermo-hygro- metric comfort. Besides, greenery is the only known side-effect-free solution against heat in cities. These include not only the direct contribution to climate action but also a significant impact on halting biodiversity loss, through increased urban biodiversity, and an increase in healthy living conditions in cities, through the additional binding of air and water pollutants. According to UN SDG #3, nursing healthy lives and promoting well-being for all is central to building prosperous social, from a broader perspective. Debating urban green scape links these topics in a section that focuses on the potential of natural elements. Improving elderly access to green spaces and water is thus important for the “urban agenda”.

To conclude, the evidence that climate change will potentially adversely affect older people living in cities is compelling. If the rapidly increasing older population worldwide is to be protected from the effects of climate change to the highest degree, this will be possible via urban planning modelling, local and human biometrics recording and via science-based solutions, including cities’ re-greening.

References

• W. Rosa, Ed., “Transforming Our World: The 2030 Agenda for Sustainable Development,” in A New Era in Global Health, New York, NY: Springer Publishing Company, 2017. • T. R. Oke, “The energetic basis of the urban heat island,” Q. J. R. Meteorol. Soc., vol. 108, no. 455, pp. 1–24, 1982. • E. Kalnay and M. Cai, “Impact of urbanization and land-use change on climate,” Nature, vol. 423, no. 6939, pp. 528–531, May 2003. • Naboni, E & Havinga, L. C. (2019) Regenerative Design in Digital Practice: A Handbook for the Built Environment. Bolzano, IT. Eurac. ISBN 978-3-9504607-2-8 (Online), ISBN 978-3-9504607-3-5 (Print). • Naboni, E, Natanian J, Brizzi G, Florio P, Chokhachian A, Galanos T, and Rastogi P. “A Digital Workflow to Quantify Regenerative Urban Design in the Context of a Changing Climate.” Renewable and Sustainable Energy Reviews 113 (October 1, 2019): 109255.

174 Loneliness is not simply being alone. How understand the loneliness in ageing to create smart environments

1 1 1,2 Lucia González López , Ana Perandrés Gómez , Alfonso Cruz Lendínez

1 Ageing Social Lab Foundation, Parque Tecnológico de Geolit (Spain) 2 Universidad de Jaén, Departamento de Enfermería

[email protected], [email protected], [email protected]

Keywords: loneliness, ageing, emotional loneliness, social loneliness, dignified and positive ageing

175

Loneliness is a widespread concern among older people, especially in societies that are age- ing exponentially. In this sense, loneliness is defined as the “unpleasant state experienced when there is a discrepancy between the interpersonal relationships one wishes to have and those one perceives one actually has”. However, loneliness is not only an unpleasant state that has direct social consequences on the person, but also has negative psychological and physical impact. Although, until now, loneliness has been mostly studied homogeneously as a one-dimensional construct, Weiss (1973) proposes a model in which this variable has two dimensions; social and emotional. It is specifically the emotional dimension of loneliness, which acts as a harmful and “toxic” component for health, and which has an impact on the increased risk of mortality (O’Súilleabháin, Gallagher & Steptoe, 2019). In this vein, it is impor- tant to understand the impact of this socio-emotional variable on the physiological bases (e.g. immune system, endocrine system) and neural correlates (e.g. central nervous system; Nakagawa, 2015) that can determine the quality of life and life expectancy of older people.

Loneliness also has a cross-cultural component. In this way, each society has group characte- ristics that determine that loneliness is conceptualized at group level in a specific way within the same culture. Particurlarly, the feeling of loneliness is perceived as less harmful in indivi- dualistic societies (e.g. the United States) than in collectivist societies (e.g. Spain), although it is less common to be alone in the latter (Lykes, & Kemmelmeier, 2014).

Understanding the differentiating characteristics of loneliness at both the individual and cultu- ral levels can serve as an axis to contribute to the improvement of the living conditions of older people, not only to intervene on a social level or on their physical and mental health but also to develop products and services that adapt to the individual configured within their own culture (for example, the segmentation of international markets taking into account the theoretical framework proposed by Hofstede; Soares, Farhangmehr, Shoham, 2007; Hofste- de, 1984).

That is why loneliness constitutes a factor continuously ignored, but of vital importance and that must be taken into account in the ageing process. Ageing lab works to address the challenge of loneliness in a holistic manner and taking into account its multidimensional character through various actions guided from a Model that preserves the Dignified and Positive Ageing (PDA Model; Rodríguez González, & Cruz, Lendínez, 2016). These actions include, for example; An Intergenerational Center, projects that aim to address social Isolation in ageing through the application of technology (Pharaon) or digital training (E- inclusion).

Specifically, the creation of an Intergenerational Center makes it possible to work with different generations (children and older adults), addressing the complementary needs of both and fostering positive and affective relationships among all ages, as well as the transmission of knowledge, respect and social, emotional and cognitive development and in short; preven- ting social isolation. In addition, the E-inclusion project aims to provide digital training tools for older people in their own homes. To this end, home help assistants are trained in Digital technologies, so that, in their daily and direct intervention with older adults, they can play the role of digital trainer with them, thus promoting both direct and online relationships. On the other hand, the Pharaon European Project aims to integrate platforms with a multitude of technologies that aim to improve the quality of life and the needs of people in an ageing society. Specifically, through this project, Ageing lab as a partner aims to focus on the social field to respond the challenge of loneliness.

176 References

• Brown, E. G., Gallagher, S., & Creaven, A. M. (2018). Loneliness and acute stress reactivity: A systematic review of psychophysiological studies. Psychophysiology, 55(5), e13031. • Hofstede G. (1984). Culture’s consequences: international differences in work-related values. Newbury Park, CA: Sage Publications. • Lykes, V. A., & Kemmelmeier, M. (2014). What predicts loneliness? Cultural difference between individualistic and collectivistic societies in Europe. Journal of Cross-Cultural Psychology, 45(3), 468-490. • Nakagawa, S., Takeuchi, H., Taki, Y., Nouchi, R., Sekiguchi, A., Kotozaki, Y., ... & Yamamoto, Y. (2015). White matter structures associated with loneliness in young adults. Scientific reports, 5, 17001. • O’Súilleabháin, P.S.,Gallagher S. & Steptoe, A. (2019). Loneliness, Living Alone, and All-Cause Mortality: The Role of Emotional and Social Loneliness in the Elderly During 19 Years of Follow-Up. Psychosomatic Medicine;81(6):521-526. • Rodríguez González, A. & Cruz, Lendínez, A. (2016). Modelo EDP: Envejecimiento Digno y Positivo. Fundación Ageing Lab. Jaén (1ª Ed). • Soares, A. M., Farhangmehr, M., & Shoham, A. (2007). Hofstede’s dimensions of culture in international marketing studies. Journal of business research, 60(3), 277-284. • Weiss, R.S. (1973). Loneliness: The experience of emotional and social isolation. Cambridge, MA, US: The MIT Press.

177 From Co-design to Industry: investigating both ends of the scale of developing Social Robots

1 2 Kasper Rodil , Michal Isaacson

1 Department of Architecture, Design and Media Technology, Aalborg University, Denmark 2 Faculty of Social Welfare & Health Sciences Department of Gerontology, Haifa University, Israel

[email protected], [email protected]

Keywords: social robots, industry, co-design, participatory design

178

Introduction

This paper merges two separate work packages – Co-design as a genuine bottom-up approach and industry-based perspectives on the development and future of social robotics. The first work package is mainly informed from undertaking a literature review of the premiere outlet for Co-design – the Participatory Design Conference series (bi-annual since 1990), where the focus was on Elderly and Social Robots. That particular work package is published in the [COST] state-of-the-art report as well as underpinned by own publications and research from a Danish neuro-centre context of co-designing social robots (Rodil, Rehm and Krum- mheuer, 2018).

The other, industrial, view has been informed by a Short Term Scientific Mission granted under the Sheld-on COST Action (16226) allowing a trip to visit Dr Isaacson at Faculty of Social Welfare & Health Sciences Department of Gerontology, Haifa University in Israel and a series of companies pertaining to development of social robots (and more broadly) for the ageing population. The STSM objectives (STSM, 2018).

The concept of Social Robots on a scale

Social Robots, defined according to the International Journal of Social Robotics, as “Social Ro- botics is the study of robots that can interact and communicate between themselves, with humans, and with the environment, within the social and cultural structure attached to its role.” (IJSR, 2018)

There are many definitions like the one above and they all share the inability to accurately address, what makes a robot a social robot? In times that are very positive towards the mul- titude of challenges which can be remedied by artificial intelligence, robots etc. it became a somewhat strange experience to keep reading about the promised land of these social robots – both in academic outlets and in mass news outlets. As the definition (above) fails in capturing the most basic nature of social robots, we decided to investigate the concept and its plausible relation to the ageing population. We formulated an initial research question based on the assumption that the ageing population, due to shrinking personal networks and (in many cases) are less digitally literate than younger generations, must be primary adopters of sociality enabling autonomous creatures – social robots.

The idea of a scale came from considering how underlying technology development proces- ses might or might not be the reason why there is little fruitful construction and uptake of technology. On one end we positioned perhaps the most academic and bottom-up process (Participatory Design/Co-design) and on the other end user-centred companies employing novel technologies available – some of the companies where the expectations to see them succeed are highest. The question was, are any of those two ends representing technology, which has enabled the ageing population to be more social with the use of robots?

Findings

The investigation began by surveying 433 papers from the Participatory Design Conference se- ries as Participatory Design (PD) represents a design methodology very much arguing for the inclusion of future users in design processes as a way to handle the technical barriers, make the systems more relevant on a small scale (contextual and situational) – all to ease adoption.

179 The first time robots are mentioned directly in the literature is in the proceedings of the PDC in 2002. In conclusion, the field of PD (limited by the PDC proceedings and search strategy) does not provide any technical endeavours cracking the code of making robots social for the ageing population. Robot solutions are primarily limited to ideation (very early project pha- ses), and very little tangible robotic development is published. While the field of PD has fa- llen short in providing a way into a promising social robotic future, we decided to look at the other end – the industrial one. This industrial end shows optimism in the number of millions of dollars invested and where there is a liberal, almost deterministic view on technology.

It should be mentioned that the company visits in Israel were located at some of the companies which the rest of the world expects a lot from.

To summarise the insights gained during the STSM.

The visited companies in Israel are:

1. Not designing and developing with a participatory agenda. In general, the companies’ approach can best be classified as user-centred. 2. Having significant challenges in identifying who will pay for their products. Products are too expensive for the ageing population (B to C) and not attractive enough for institutions (B to B). 3. Not yet able to entirely create functioning social robots and they are not social The conclusion is, there are not yet social robots for the ageing population to adopt, nor are these systems just around the corner. Technologically they are not advanced enough or affordable, but most importantly, they do not seem to capture the essence of being social.

References

• IJSR: International Journal of Social Robotics. (2018) Retrieved 09-02-2018 from • Rodil, K, Rehm, M & Krummheuer, AL 2018, Co-designing Social Robots With Cognitively Impaired Citizens. i Proceedings of the 10th Nordic Conference on Human-Computer Interaction (NordiCHI ‘18 ). Association for Computing Machinery, The 10th Nordic Conference on Human-Computer Interaction, Oslo, Norge, 01/10/2018. • STSM Objectives. 2018. (accessed 21/8/2019):

180 Thermal comfort perception in elderly care centers. Case study of Lithuania

1 2 3 Lina Seduikyte , Mantas Dobravalskis , Ugnė Didžiariekytė

1 Assoc.prof., Kaunas University of Technology, Faculty of Civil Engineering and Architecture, Studentu str. 48, 51367, Kaunas, Lithuania 2 PhD student, Kaunas University of Technology, Faculty of Civil Engineering and Architecture, Studentu str. 48, 51367, Kaunas, Lithuania 3 Master student, Kaunas University of Technology, Faculty of Civil Engineering and Architecture, Studentu str. 48, 51367, Kaunas, Lithuania

[email protected], [email protected], [email protected]

Keywords: elderly, indoor environment, thermal comfort, care centers

181

Introduction

People spend indoors between 80 – 90 % of their time. It is important to ensure both good in- door air quality and thermal comfort indoors, especially to people who belong to vulnerable groups. Studies show, that in order to have comfortable indoor conditions which could sa- tisfy aged people different thermohygrometric values are required, especially during winter season (Salata et al. 2018). Giamalaki and Kolokotsa (2019) presented study which shows, that there was a preference among the elderly for a warmer thermal environment in winter sea- son compared to cooler in summer. The aim of this study is to investigate thermal comfort conditions in elderly care center locate in Kaunas, Lithuania.

Field measurements

Field measurements were performed in winter (February) and summer (August) period of 2019, in Panemunes Silas elderly care center (ECC) located in Kaunas, Lithuania. Temperature and relative humidity were measured in 15 naturally ventilated rooms (6 belong to an intensive care section). Temperature and relative humidity (RH) were measured with HOBO MX11011 data loggers (±0.2 °C, ±2% RH accuracy). The stand with instruments was placed in the middle of each room. The temperature and RH were measured in all tested rooms at 0.1, 0.6 and 1.1 m height. Parameters were recorded at 1 min. interval. Short term measurement method was used: equipment was placed in the rooms for 35 min. For the stabilization of indoor en- vironment conditions 25 min. were dedicated, and 10 min. of data were analysed.

Results

In Fig. 1, the average indoor temperature and RH of winter and summer periods are presented.

Winter season results show that an average temperature in rooms was 20.36 °C and the ave- rage relative humidity was 37.38 %. In some rooms values were outside national reference norms (HN 125:2011), as temperature was 17.52°C and relative humidity was lower than 24 %.

Summer season results show that an average temperature in rooms was 24.02°C and the ave- rage relative humidity was 55.18 %.

ºC

%

21

19 Relative humidity Relative temperature Average

17

Figure 1: Average temperature and relative humidity in the rooms at height of 1.1 meters

Conclusions

Some studies (Salata et al. 2018) show that even if the thermal comfort zone is achieved, (tem- perature between 20 °C and 24 °C) it might be not warm enough for elderly people as they

182 would prefer temperature by 2 °C higher than neutral temperature.

Investigation in Panemunes Silas ECC shows, that in some cases thermal comfort would not be ensured for elderly people.

There is a need of further investigation of thermal comfort conditions in premises where elderly people spend most of their time.

References

• Giamalaki M. and Kolokotsa D. 2019.Understanding the thermal experience of elderly people in their residence: Study on thermal comfort and adaptive behaviors of senior citizens in Crete, Greece. Energy and Buildings, 185: 76-87. • Salata F., Golasi I., verrusio w., de Lieto Vollaro E., Cacciafesta M., de Lieto Vollaro A. 2018. On the necessities to analyse the thermohygrometric perception in aged people. A review about indoor thermal comfort, health and energetic aspects and perspective for future studies. Sustainable cities and Societies, 41: 469-480.

Acknowledgments

This work was supported by COST Action CA16226 “Indoor living space improvement: Smart Habitat for the Elderly (SHELD-ON)”.

183 Human – machine interaction - Assessment of seniors’ characteristics and its importance for predicting performance patterns

1 2 Karel Schmeidler , Katerina Marsalkova

1 Institute of Forensic Engineering, Brno University of Technology, Purkynova 118, 612 00 Brno, Czech Republic 2 Czech ministry of Justice, Prague, Vysehradská 16, 128 00 Prague, Czech Republic

[email protected], [email protected]

Keywords: ageing european population, seniors, HMI - human machine interface, IT solutions, mobility behaviour, cross-disciplinary research

184

Demographic trends in last decades indicate a marked rise in the number of elderly people in the population and there is a high correlation between age, mobility and disability. The share of elderly people in the total population is expected to rise from 21% in 2000 to 31% by the year 2020 and to around 34% by the year 2050. Disabled people represent around 13% of the nation. Various initiatives recognize the needs of elderly people. With changing attitudes and conditions the desire to travel for social life, business and leisure represents a potential major new source of ideas for travel providers, architects, urban planners, car, IVIS and ADAS designers and producers.

Due to demographic changes older and disabled people represent a significant and perma- nent growing part of Czech population. We lack progressive increase over the past decade in the social awareness of the requirements for older and disabled people throughout the Czech Republic and other parts of Europe. This progression should move from making pro- vision for older and disabled people on a welfare-oriented basis, towards increasingly more understanding of elderly needs, equal access to all facilities as a matter of human rights. Improving access to any form of travel will provide additional social, as well as economic, benefits at personal, governmental and commercial levels. Faced with the challenges of an ageing European population, we would like to encourage innovative solutions that trans- form this challenge into an opportunity by responding to the needs of older Europeans and making active ageing a reality, keeping older people healthy, mobile, independent, socially engaged and fulfilled as valued contributors to society.

Senior drivers may constitute some danger because their limitations regarding some obsta- cles connected with age and deterioration of their senses. But new IT solutions may help; especially IVIS and ADAS solutions are very promising. But many questions regarding proper use are still not answered. We should use a procedure to recommend that helps to unders- tand existing Human - Machine Interaction problems. Many needs become only transparent if appropriate methods are used. In the case of our projects, a combination of qualitative, quantitative and heuristic methods was chosen. Relevant questions were discussed and ela- borated together with the target groups.

There is an increasing conviction among researchers that personal variables should be taken in higher consideration in the researchers studying effect of in-vehicle informational systems (IVIS) on safety. Senior drivers’ characteristics and features are inadequately, or not syste- matically, evaluated in behavioural research, especially in performance studies, which may confound research results. In fact there is sufficient scientific evidence that driving is a result not only of maximum drivers’ capabilities, but also of senior drivers’ features, attitudes and motives determining how old drivers’ use their cognitive-motor skills. It could be useful for the specific research to make use of a miscellaneous of different methodologies that take into account the complexity of drivers’ behaviour. In particular the possibility to discriminate a priori between careful and risky drivers when evaluating certain behavioural patterns should be employ both in laboratory and real traffic experiments, to evaluate the impact and the weight of those variables on the observed behaviour when IVIS is in use. To overcome those current difficulties, several levels of work were carried out: A review of existing psychological knowledge on personal features, attitudes and motives influential for driving has been con- fronted and integrated with the research run in VUT Brno on the development of a ‘reference model’ of typical behaviour that was used as a reference case in simulator trials or in real road observational studies. The input consists in the recognition of personal variables influencing performance as well as in the individuation of existing assessment tools that could be appro-

185 priate for research requirements on this topic.

References

• Improving Transport for People with Mobility Handicaps. (1999: Ministry of Transport, Prague, Czech Republic) • Národní rozvojový program mobility pro všechny (NRPM – Česká republika). • SCHMEIDLER K., Weinberger J.: Staří lidé v dopravě, PSYCHOLOGIE DNES strana 6-8, 1 fotografie, číslo 10, 2001, ročník VII. • SCHMEIDLER K.,: Projekt AGILE – Aged people integration, mobility, safety and quality of life enhancement through driving, DOPRAVA, strana 38, číslo 6, 2003, ročník 45 • SCHMEIDLER K.,: Vzájemný vliv rostoucí mobility a struktury měst, EKO – EKOLOGIE A SPOLEČNOST, vydává ČNTL, s r.o. strana 27-31, číslo 5, 2003 ročník XII. • Schmeidler, K.: Evropský program CONSENSUS – zvýšení mobility osob se zdravotním omezením, Horizonty Dopravy roč. XII – č.3./ 2004, Výskumný ústav dopravný, Žilina, Slovensko, strana 26-28 • Schmeidler, K.: Growing mobility and Land Use in the Czech Republic, Vědecká konference university Namur, Belgie: Changing behaviour towards a more sustainable transport system, WATCH COST Action 355,d 2004 • Schmeidler, K.: ASI PROJECT – Asses Implementation in the frame of the Cities of Tommorrow, congress AESOP, Metropolitan Planning and Environmental Issues, Sustainable cities and Transport, Grenoble, Université Pierre Mendes, Francie, 2004, sborník vydán formou CD a abstrakta na str. 218 sborníku abstrakt • Schmeidler, K.: Mobility management k riešeniu ekologickej problematiky dopravy v mestách a regiónoch, Horizonty Dopravy roč. XII – č.2./ 2004, Výskumný ústav dopravný, Žilina, Slovensko, strana 26-30 • Schmeidler, K.: Urban Structure and the present-day mobility requirements, proceedings STELLA workshop in Brussels, Belgium, page11-21 sborníku STELLA, Brussels, 2004 • SIZE - “Life quality of senior citizens in relation to mobility conditions”, Project number QLK6-CT-2002-02399. Project in the framework of the specific research and technological development programme “Quality of life and management of living resources”, Key action 6 “The ageing population and disabilities” in EU’s Fifth Framework;

186 Potential biomarkers and risk factors associated with frailty syndrome in older adults – preliminary results from the BioFrail study

Armanda Teixeira-Gomes1,2*, Solange Costa1,2*, Bruna Lage1,2, Susana Silva1, 3 3 1,2 Vanessa Valdiglesias , Blanca Laffon , João Paulo Teixeira * Both authors have contributed equally to the work

1 National Institute of Health, Environmental Health Dept, Rua Alexandre Herculano, 321, 4000-055 Porto, Portugal 2 EPIUnit-Instituto de Saúde Pública da Universidade do Porto, Rua das Taipas 135, 4050-091 Porto, Portugal 3 DICOMOSA Goup, Dept of Psychology, Area of Psychobiology, Faculty of Education Sciences, Universidade da Coruña, Campos Elviña s/n, 15071-A Coruña, Spain [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], jpft12 @gmail.com

Keywords: ageing, frailty, biomarkers, risk factors

187

One area of concern in the scientific community is the burden of environmentally induced di- sease in susceptible populations. Older adults are a well-recognized susceptible population due to the decline of immune defences and the burden of multiple chronic diseases. Frailty is a multidimensional geriatric syndrome characterised by increased vulnerability and func- tional decline that may be reversed if addressed early. Our goal was to explore possible bio- markers associated with frailty and risk factors (social, medical, occupational history).

A group of older adults (≥ 65 years old) was engaged in this study. Frailty status was assessed via Fried’s frailty model, 47.5% as robust, 49.2% as pre-frail and 3.3% as frail. A significant higher prevalence of second-hand smokers was observed in the pre-frail group when compared to robust. Additionally, a higher prevalence of robust individuals consuming home-produced vegetables was found in comparison with pre-frail.

Comet assay was used to assess DNA damage in whole blood and H2AX assay for H2AX phos- phorylation in lymphocytes. Blood samples were also used for the quantification of mercury levels. Key exposures were assessed through the application of a lifetime exposure ques- tionnaire and the influence of nutritional status, cognitive function and functional status was evaluated by standardized scales. No significant differences were found between robust and pre-frails regarding the biological endpoints tested. Nevertheless, among the robust group the levels of H2AX phosphorylation were significantly increased in former smokers compa- red with never smokers. In addition, oxidative DNA damage were decreased in robust con- suming home-produced vegetables. Regarding the pre-frail group: a higher prevalence of second-hand smokers was observed in this group compared to robust.

Preliminary results obtained in this study are of paramount importance to understand if the way we live(d) or worked can impact the way we age. Further investigation is necessary to explo- re the role of key exposures, its impact on health, to standardize biomarkers to be used in clinics, and to understand the influence of clinical parameters.

Acknowledgments

The authors acknowledge the contribution of the COST Action CA16226 (Indoor living space improvement: Smart Habitat for the Elderly). The work developed by Armanda Teixeira-Gomes and Solange Costa is supported by the Portuguese Foundation for Science and Technology (FCT) under the grants SFRH/BD/121802/2016 and SFRH/BPD/100948/2014, respectively. Vanessa Valdiglesias was supported by Xunta de Galicia postdoctoral fellowship (reference ED481B 2016/190-0).

188 Consciousness Society creation. Stage development

Dumitru Todoroi

1 Republic of Moldova, Academy of Economic Studies of Moldova [email protected]

Keywords: artificial intelligence, creativity, emotions, consciousness society, human – robotic society

189

This exploratory paper investigates Adaptability as the Engine to create Computer Software for the Consciousness Society. Adaptable Tools advance Software for Artificial Intelligences. The paper aims to open the discussion around the impact that Adaptable Software Tools might have on Information, Knowledge Based and Consciousness Societies i.e. Homo-Robotic So- cieties of Information Era.

Paper employs an exploratory literature review investigating the development of Conscious- ness Society and current state of the art in relation to Adaptability as Engine to create Sof- tware. They are transformed in a new Engine: the Robotic Adaptable Tools. It is discussed the Impact that Robotic Adaptable Tools might have on Consciousness Human-Robotic and Robotic-Human Societies.

Purpose: This exploratory paper investigates Adaptability as the Engine to create Computer Software for the Consciousness Society. Adaptable Tools advance Software for Artificial In- telligences. The paper aims to open the discussion around the impact that Adaptable Sof- tware Tools might have on Information, Knowledge Based and Consciousness Societies i.e. Homo-Robotic Societies of Information Era.

Design/methodology/approach: Paper employs an exploratory literature review investigating the development and current state of the art in relation to Adaptability as Engine to create Software for Computers, Systems, Networks, and Complexes of Intelligent machines in In- formation Era; this literature review serves as the starting point of subsequent theorizing.

Findings: Based on the literature review we theorize that the Adaptable Tools were used for creation Software for different Generations of Computers, Systems, Networks, and Comple- xes of Intelligent machines, last ones representing ROBO - intelligences with creativity, emo- tions, temperaments, and sentiments. In this process Adaptable Tools achieves new horizon of creation, they are transformed in a new Engine: the Robotic Adaptable Tools. To name just a few uses of Robotic Adaptable Tools, its can help in: (1) supporting definitions of new ro- botic intelligence entities, (2) its stratification, (3) its algorithmic representation, and therefore (4) improving robotic skills and competences as well as (5) generating requirements for new competences, and (6) promoting a collaborative environment among the Actors of Compu- ting Industry.

Research limitations/implications: This paper opens the discussion around creation Software for Computers, Systems, Networks, and Complexes of machines using Adaptable Tools, as well as its succession in creation Artificial Intelligences with creative, emotional, tempera- mental and sensual possibilities using Robotic Adaptable Tools. Paper suggests a wide ran- ge of areas for further research in the branch of Robotic Industry.

Practical implications: In this paper we argue that by looking at Robotic Adaptable Tools as more than just a set of tools for improving robotic intelligences Adaptable Computing in Robotic Industry can address some pitfalls of a particular type of Homo-Robotic communi- cation in Consciousness Society.

Originality/value/sustainability: The Adaptable Tools have been developed as part of Softwa- re Industry in Information Era. They have used in creating Computer Systems for different ge- nerations of computers. The Robotic Adaptable Tools are a new and very popular approach and is demonstrated that it is powerful in many areas of Artificial intelligences Industry. This paper is novel in that it initiates a dialogue around the impact that Robotic Adaptable Tools

190 might have on Human-Robotic and Robotic-Human Societies.

JEL Classification: C88, L86, M00, O31, L20.

191 Attempts to improve the situation of elderly people in the Republic of Moldova

Dumitru Todoroi

1 Republic of Moldova, Academy of Economic Studies of Moldova [email protected]

Keywords: robots, Intelligence, consciousness society, age- friendly digital world, human – robotic society

192

Introduction

The human society has entered the period of immense studying of possibilities of psycholo- gical working together of people and robots in the future Human – Robotic Consciousness Society.

The research projects are proposed and the implementation of the population support activi- ties in the Republic of Moldova in the context of the improvement of the life of youth people and rural intellectuals and, in particular, improving the living conditions of Women, girls and elders. Special attention is required to study of robots influence in the life of age.

The project “Solutions for migration in the rural sector of the Republic of Moldova” is concer- ned to anti-migration management in the Republic of Moldova. The project aims to decrease the number of labour migrants from Republic of Moldova by 40% by creating new working places and developing of abilities of working according to European and global standards.

The Project “Solutions for migration of intellectuals from rural sector of the Republic of Mol- dova” had to implement an twelve month activity by teaching about 80 intellectuals from rural sector of the Republic of Moldova how to write different kind of projects in order to atrack local, republic, and foreign investors and to rise rural sector in different areas.

The general objective of the project “ENtrepreneurial and DIgital Skills from school to univer- sity – a solution for future economic development of North-East Romania and Republic of Moldova. Educational network around the RO-MD border” is to support the economic development on both sides of the Romania-Republic of Moldova border, through building a joint network of educational institutions to pilot a program on entrepreneurial education and digital education.

The project “Business plans for Women’s rural SMEs” will perform the business development training for women and girls on how to write SOFT Business plans for Women rural SMEs.

Main objective of the Project “Rural SMEs for Network “Demand & Supply” consists in training the population from Ungheni District’s rural localities to prepare Dijital Business plans for rural SMEs

Proposition the Proposal: “The creation of the European Network for the implementation and support of industry for elderly people” contains Strategic goals and Spheres of activities for people in the age.

The proposed Proposal: “Organization the international on-line conference “Creation Cons- ciousness Society” in March 2020 constitutes creation of the Network on collaboration of People and Robots on physical, intellectual and spiritual life in an Age-friendly Conscious- ness Society.

Conclusion

These propositions represent investigation of the anti-migration procedures in rural localities of the Republic of Moldova through the creation of the digital Business plans with special orientation. They propose the training activities with respect to youth, women and the elderly through the anti-migration procedures of population in the Republic of Moldova are propo-

193 sed.

The proposition for COST Activities “Creation the group of researches to work on Interna- tional Interdisciplinary Network on Health and Wellbeing in an Age-friendly Digital World” shall intensify research collaboration opportunities to people and robots on physical, inte- llectual and spiritual dimensions. Special attention is required to study of robots influence in the life of age.

Similarly, perspectives on the ageing life course and the role of the elderly within society have made the elderly more essential to maintaining the workforce and offer opportunities for ol- der adults to remain economically active long after traditional retirement ages.

JEL Classification: C88, L86, M00, O31, L20.

194 Quantitative analysis of publications on policies for ageing at work

Jasmina Baraković Husić1, Sabina Baraković2, Petre Lameski3, Eftim Zdra- vevski3, Vladimir Trajkovik3, Ivan Chorbev3, Igor Ljubi4, Petra Marešova5, 5 6 Ondrej Krejcar , Signe Tomsone

1 University of Sarajevo, Faculty of Electrical Engineering, Sarajevo, B&H 2 University of Sarajevo, Faculty of Transport and Communications, Sarajevo, B&H 3 University of Sts. Cyril and Methodius, Faculty of Computer Science and Engineering, Skoplje, North Macedonia 4 Ministry of Public Administration, Zagreb, Croatia 5 University of Hradec Kralove, Faculty of Informatics and Management, Hradec Kralove, Czech Republic 6 Rīga Stradiņš University, Faculty of Rehabilitation, Riga, Latvia

[email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]

Keywords: aging at work, policy, quantitative analysis

195

Abstract. The concept of Ageing at work comes from the urgent requirement to help ageing workforce of the contemporary industries to maintain productivity, while achieving a work and private life balance. Current re-search activities on polices covering the concept of Ageing at work are lim-ited and conceptually different. This paper provides the quantitative analy-sis of publications on policies for Ageing at work. After search of databases (i.e., Pub- Med, IEEE Xplore, and Springer), a total of 1724 publications were selected. The quantitative analysis show that age based discrimination was the main motivation for research activities which were directed to the dis-cussion of legislation and national policy solutions, as well as intensifica-tion of collaboration at international level. These findings may serve as a starting point for future research activities intended to propose policy framework that describes the concept of Ageing at work.

Introduction

Demographical data suggest a rapid aging in the active workforce, as a result of the overall aging of the population. The concept of Ageing at work comes from the ur-gent require- ment to help ageing workforce of the contemporary industries to maintain productivity, while achieving a work and private life balance. While there are many researches regarding aging population (i.e. over 65), current research activities on policies covering the concept of Age- ing at work are limited and conceptually differ-ent.

The objective of this paper is to perform the quantitative analysis of publications on policies for ageing at work, which is intended to start the discussion and formula-tion of policy framework that describes the concept of Ageing at work. It targets gov-ernment decision-makers, the non-governmental sector, the private sector, and all of whom are responsible for the formu- lation of policies on Ageing at work.

Methodology

The quantitative analysis of publications on policies associated with the concept of Ageing at work is conducted. A publication search was undertaken in August 2019 in order to identi- fy published peer-reviewed articles in English. The databases searched included PubMed, IEEE Xplore, and Springer (the first source from 2008 until the last in 2019). The keywords included the following phrases: “successful aging at work”, “active aging at work”, “healthy aging at work”, “productive aging at work”, “technology for active aging at work”, “older adults at work”, “successful aging in gerontology and life span psychology”. In addition, Motivation and Solution proper-ties groups were used for selection of the publications for the analysis.

Results

A total of 47.330 publications were identified through databases searching and 25.187 publi- cations were screened. After 7756 screened publications were excluded from the further analysis, a total of 17.431 article abstracts were assessed for eligibility. Final-ly, a total of 1724 publications were selected for further qualitative analysis.

The most keywords were found in PubMed database (1094 articles), followed by Springer (584 articles), and IEEE Xplore (56 articles). The “older adults at work” is the most common keyword phrase found in 861 articles, followed by “successful aging at work” found in 261 articles, “active aging at work” found in 211 articles, “productive aging at work” found in 190 articles, “healthy aging at work” found in 175 articles, “technology for active aging at work”

196 found in 23 articles, and “success-ful aging in gerontology and life span psychology” found in 3 articles. The most keywords (226) were contained in articles published in 2016.

The Solutions property group was more often analyzed compared to Motivation property group. The “age based discrimination” was the most frequently analyzed Motivation property, since it was mentioned in 775 articles. The “legislation” is the most prominent Solutions property that was contained in 585 selected articles. The highest co-occurrence (21) of both Moti- vation and Solutions properties in Springer publications was found in articles considering “national policy” and “sustainable growth” properties.

Conclusion

The “age based discrimination” was the main motivation for research activities which were di- rected to the discussion of “legislation” and “national policy” solutions and intensification of collaboration at international level. A conducted quantitative analy-sis serves as a basis for the qualitative analysis intended to propose policy framework, which provides a comprehen- sive approach for addressing the challenges related to the concept of Ageing at work and the development of specific actions at national and international levels. Therefore, a research agenda for future research is formulated: (1) conduct the qualitative analysis to propose the policy framework on Ageing at work, (2) conduct a meta-analysis on factors influencing the concept of Aging at work, (3) design a national-level and EU-level survey about the about the current policies associated with the concept of Ageing at work.

197 Can deep learning contribute to healthy aging at work?

Petre Lameski1, Eftim Zdravevski1, Ivan Chorbev1, Rossitza Goleva2, Nuno 3 3 3 1 Pombo , Nuno M. Garcia , Ivan Pires , Vladimir Trajkovik

1 Ss Cyril and Methodius University in Skopje, Faculty of Computer Science and Engineering, North Macedonia 2 New Bulgarian University, Department of Informatics, Sofia, Bulgaria 3 Universidade da Beira Interior, Portugal

[email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]

Keywords: deep learning, healthy aging at work, machine learning

198

The increased aging population has introduced quite a few challenges for the developed and the developing countries. For the European Union it is estimated that by 2050 there will be a very large percent of older population and that the burden on the health insurance systems and the pension systems will be significantly increased. One of the ways to tackle this pro- blem is to provide healthy environment for the aging population in their working premises which would increase their productivity and prolong their ability to provide income for them- selves and the community.

Machine learning has contributed to elevating various areas of the human society and industrial development. In recent years deep learning as branch of machine learning has received increased interest due to the excellent state of the art results obtained with deep learning based approaches.

In this paper we review the development of deep learning research and applications that give contribution towards healthy aging at work. For this purpose, we do a quantitative analysis of the available publications in IEEE, PubMed and Springer. We perform an automated sur- vey using the NLP toolkit presented in (Zdravevski et al. 2017). We review papers published from 2009 to 2019 that contain any of the phrases: “successful aging at work”, “active aging at work”, “healthy aging at work”, “productive aging at work”, “technology for active aging at work”, “older adults at work”, “successful aging in gerontology and life span psychology”. Additionally, we filter out papers that contain phrases related to Artificial Intelligence and its subfields. The obtained results are depicted in Figure 1.

Figure 1:AI related topics trends

As it can be observed there is an increasing trend in publications and thus in scientific interest for applications for healthy aging at work and applications of machine learning and inter- net of things which is quite popular. Another observation can be made that deep learning related topics such as Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs) and the term deep learning itself are also trending and have almost linear increase in the number of publications. Without getting into the qualitative analysis of the papers one could conclude that deep learning topics are becoming more and more relevant in the heal- thy aging at work research areas and we can expect increased number of publications and applications in the following period.

199 References

• Buschbeck L., Kehr E., Jensen U. 1961a. Untersuchungen über die Eignung verschiedener Holzarten und Sortimente zur Herstellung von Spanplatten – 1. Mitteilung: Rotbuche und Kiefer. Holztechnologie, 2, 2: 99–110 • Kollmann F., Kuenzi W. E., Stamm J. A. 1975. Principles of Wood Science and Technology – Volume II: Wood Based Materials. Berlin, Heidelberg, New York, Tokyo, Springer–Verlag: 312–550 • Xu W., Suchsland O. 1998. Variability of particleboard properties from single– and mixed–species processes. Forest Products Journal, 48, 9: 68–74

Acknowledgments

The work presented in this paper is partially funded by the Ss. Cyril and Methodius University in Skopje, Faculty of Computer Science and Engineering, North Macedonia. We also acknowledge the support given for publishing this paper from the SHELD-ON COST Action CA16226.

200 Work and social activity for longer healthy life

Lilia Raycheva

Bulgaria, The St. Kliment Ohridski Sofia University [email protected]

Keywords: work, social activity, healthy life, Bulgaria

201

Institutional attention for the ageing population began to appear in the last decade of the 20th century. On December 16, 1991 the United Nations General Assembly adopted Principles for Older Persons (Resolution 46/91). The outlined 18 principles are grouped under five themes: independence, participation, care, self-fulfilment and dignity. In the section on dignity there is a text against age discrimination: “Older persons should be treated fairly regardless of age, gender, racial or ethnic background, disability or other status, and be valued independently of their economic contribution” (United Nations, 1991).

The Madrid International Plan of Action on Ageing (MIPAA) and the Political Declaration, adop- ted by the Second World Assembly on Ageing in April 2002, are still among the global gui- ding documents that have a priority focus on the areas of the rights of older adults and their well-being in a supportive environment (United Nations, 2014).

According to the National Statistical Institute, Bulgaria ranks 7th in the proportion of senior citi- zens over 65 – 21 %. (1). The forecasts are that by 2060 life expectancy in the country for peo- ple after the age of 65 will roughly increase for men by 6.3 years and for women by 6.1 years. A very serious change is expected in the age dependency ratio (15-64), which is expressed as a percentage of the population aged 65 and over and the population aged 15-64. For 2060 its value is projected to reach 58, while currently it is 30 (National Statistical Institute, 2019).

Demographic imbalances, such as population decrease and ageing stronly impact the wor- kforce developments. Thus they create problems for the macro-fiscal stability and sustai- nability of all social systems - the labor market, the retirement methods and pension plans, the healthcare arrangements, the social assistance and long-term care order, the education classification, etc.

The European Year 2012 covered three dimensions of active ageng:

• Active and longer working life of the elderly, including: encouraging elderly to stay in work which requires improving working conditions and adapting them to the health status and needs of older workers; updating their skills by providing better access to lifelong learning and reviewing tax and benefit systems for ensuring effective incentive to work longer. • Participation of the elderly in social activities, which is understood as: improvement of opportunities and conditions for older people to contribute to the community through volunteering or as family assistants and their inclusion in society. • Creating conditions for independent living by: promoting healthy lifestyles; preventing socially significant illnesses and creating a more conducive living environment for adults (public buildings, infrastructure, transport, residential buildings) that allows them to remain independent for as long as possible. In order to achieve these goals, Bulgaria has developed two srategic important documents that outline the framework of its actions: The National Concept for Promoting the Active Life of the Elderly (2012-2013), which introduces a horizontal integrated approach and mains- treaming of ageing in all areas, sectors and policies, and The National Strategy for Active Life of the Elderly in Bulgaria (2019-2030), which develops a comprehensive cross-sectoral approach for active and productive elderly life in good health. The emphasis in both docu- ments is not only on the population demographic balance (birth rate, mortality, migration), but also on policies and measures for the development of the quality of human resources with regard to improving the health, educational and general social status of people, as well as of upgrading the conditions and quality of their lives.

202

Bulgaria is the second country after Poland in the European Union, which has started work to introduce at national and regional level the Active Ageing Index (AAI), jointly developed in 2012 by the United Nations Economic Commission for Europe and the European Commis- sion as a key monitoring tool for policy makers to enable them to devise evidence-informed strategies in dealing with the challenges of population ageing and its impacts on society. The results show so far that active longer working life and intense participation in social activities create conditions for independent living and longer healthy life of the elderly in the country.

References

• United Nations. 1991. The United Nations Principles for Older Persons. • United Nations. 2014. Madrid International Plan of Action on Ageing. • National Statistical Institute. 2019. Population Projections by Sex and Age. • Ministry of Labour and Social Policy. 2019 National Strategy for Active Life of the Elderly in Bulgaria (2019-2030).

Acknowledgments

The text has been developed within the framework of the COST Action CA 16226: SHELD-ON supported by the research project of the National Scientific Fund of Bulgaria: KP-06-COST-18.06.2019.

203 Reverse mentoring as an alternative strategy for lifelong learning in the workplace

1 2,3 1,4 Nežka Sajinčič , Andreja Istenič Starčič , Anna Sandak

1 InnoRenew CoE, Livade 6, 6310 Izola, Slovenia 2 Faculty of Education, University of Primorska, Cankarjeva 5, 6000 Koper, Slovenia 3 Faculty of Civil and Geodetic Engineering, University of Ljubljana, Jamova 2, 1000 Ljubljana, Slovenia 4 Faculty of Mathematics, Natural Sciences and Information Technologies, University of Primorska, Glagoljaška 8, 6000 Koper, Slovenia [email protected], [email protected], [email protected], [email protected], [email protected]

Keywords: reverse mentoring, older workers, lifelong learning, intergenerational learning

204

With advances in technology, many jobs are fading and new ones are emerging. Knowledge gained years ago can, in today’s world, become quickly obsolete, so the ability to stay up- to-date is more important than ever. Due to the ageing population and increase in proportion of older workers, employers need to dedicate their attention not only to the development of the young and upcoming, but also to senior employees. However, cognitive changes such as slower processing make storing and retrieving information from long-term memory harder for older employees (Ford and Orel 2005), and they may benefit from a more individualised, slower-paced method of learning. A meta-analysis also found that older workers are less willing to participate in training and career development activities (Ng and Feldman 2012). Consequently, there is an increasing need to find alternative education strategies that are effective and suitable for older workers.

A popular option for knowledge sharing is a mentoring relationship, where a mentor, often an experienced senior professional, helps a mentee, typically a new junior employee, learn and develop their career with psychological support and role modelling (Chen 2013). In addition to developing explicit knowledge, mentorship can also enhance mentees’ psychological ca- pital, especially their self-efficacy (Luthans and Youssef 2004) and resilience (Istenič Starčič and Mikoš 2019). As learning through one-on-one interaction may be more appropriate for older workers than training, we consider an alternative way to promote knowledge transfer to seniors – reverse mentoring. The main functions remain largely the same as in traditional mentoring (Chen 2013), only it involves a younger employee acting as mentor and sharing expertise with an older colleague (Murphy 2012).

The first organization to formally implement the concept was General Electric Corporation in 1999, where 500 upper-level managers were paired with technologically-savvier younger employees who successfully tutored them in using the internet (Murphy 2012). Reverse men- toring can be useful for senior workers not only to learn about current trends and obtain new technical knowledge and skills (e.g., technological improvements, use of social media), but also to gain valuable generational insights from direct social interactions (Harvey et al. 2009). However, even if the main idea behind reverse mentoring is building competencies for the older workers’ future careers, the program is also beneficial to the mentors, who develop leadership and communication skills and business knowledge, and the organization, who, by using an innovative and cost-effective strategy, encourages relationship development and promotes organizational learning, cooperation, and the dismantling of age-related stereo- types (Murphy 2012). As with any other (mentoring) relationship, there may be challenges that can hinder its success. In planning the mentoring program, organizations must define its objectives, address the potential lack of time, and consider individual differences in personality, values, and partici- pants’ willingness to make mistakes and look past their seniority and authority (Murphy 2012).

Empirical research on reverse mentoring is still in its infancy, but results look promising and its potential benefits are making the model increasingly popular. As older workers are intrinsica- lly motivated to further develop their digital skills (Kaše et al. 2019), reverse mentoring seems like an attractive option to encourage the lifelong learning of those employees who may be hesitant to participate in training programs.

References

• Chen Y.-C. 2013. Effect of reverse mentoring on traditional mentoring functions. Leadership and Management in Engineering, 13, 3: 199–208

205 • Ford R., Orel N. 2005. Older adult learners in the workforce. Journal of Career Development, 32, 2: 139–152 • Harvey M., McIntyre J., Heames J., Moeller M. 2009. Mentoring global female managers in the global marketplace: Traditional, reverse, and reciprocal mentoring. International Journal of Human Resource Management, 20, 6: 1344–1361 • Istenič Starčič A., Mikoš M. 2019. Working mentors for UL FGG students: A link between academic and work environment. Gradbeni vestnik, 68, 4: 98–105 • Kaše R., Saksida T., Mihelič K. K. 2019. Skill development in reverse mentoring: Motivational processes of mentors and learners. Human Resource Management, 58, 1: 57–69 • Luthans F., Youssef, C. M. 2004. Human, social, and now positive psychological capital management: Investing in people for competitive advantage. Organizational Dynamics, 33, 2: 143–160 • Murphy W. 2012. Reverse mentoring at work: Fostering crossgenerational learning and developing millennial leaders. Human Resource Management, 51, 4: 549–573 • Ng T., Feldman D. 2012. Evaluating six common stereotypes about older workers with meta-analytical data. Personnel Psychology, 65, 4: 821–858

Acknowledgments

The authors gratefully acknowledge the European Commission for funding the InnoRenew CoE project (Grant Agreement #739574) under the H2020 Widespread-Teaming programme and the Republic of Slovenia for funds from the European Regional Development Fund.

206 What users want – research on workers, employers and caregivers demands on SmartWork AI system

Willeke van Staalduinen, Carina Dantas, Sofia Ortet

Cáritas Diocesana de Coimbra, Portugal [email protected], [email protected], [email protected], [email protected]

Keywords: user requirements, AI system, workability older workers

207

Introduction

SmartWork, ‘Smart Age-Friendly Living and Working Environment’, is a European project ad- dressing a key challenge facing today’s older generation, as they are living and working lon- ger than their predecessors: the design and realisation of age-friendly living and working spaces. The SmartWork system will be comprised of a suite of smart services to support office workers aged 55+, delivering benefits for the workers, their employers and carers. The system will use Artificial Intelligence (AI) to unobtrusively and pervasively monitor workers’ health, behaviour, cognitive and emotional status. Through work ability modelling, it will res- pond to their needs by, e.g., identifying personalised training support for the employee to learn new skills, suggesting flexible working practices to maintain a work/life balance, while the monitoring data will enable the employee to self-manage chronic health conditions.

Figure 1: Concept SmartWork AI system

Research on user requirements

The SmartWork system will be tested in Portugal and Denmark in several pilots in office envi- ronments. To develop the system, we performed a desk study and end-user consultation among Portuguese and Danish older office workers, employers/managers and caregivers of older workers in order to define the user needs.

Online questionnaires in English, Danish and Portuguese language were used to learn more about work and preferences from the engaged stakeholders. From Aarhus Municipality (Denmark) 49 office workers participated, from Cáritas Coimbra (Portugal) 50 office workers from the appropriate age-group responded. Their results were compared with the 60 res- pondents from several other European countries, mainly The Netherlands, Greece and Fin- land. Next to the employees, 12 Portuguese and 10 Danish employers/managers filled in the questionnaire. From both countries 10 caregivers each participated.

208

Consulted on their preferences for the SmartWork AI system, older workers most value as use- ful or very useful an application that informs on meetings and events, provides guidance, reminds on appointments, provide training contents, transfers work between devices and manages or organises the work. From the European questionnaire most preferable feature would be to have a device that automatically chooses individual settings. Not very useful or not useful at all for the Danish and Portuguese employees consulted, are applications that check the health status every minute, every day or every week. Also, they don’t find inte- resting if this system provides company, reports about the working time at the computer or informs the boss on the performances. The SmartWork system should preferably become available on a smartphone or on desktop/laptop. The most preferred interaction should be by keyboard, touch screen, sensor, speech or pictograms.

Regarding the preferred functionalities of the SmartWork system, Danish employers show quite opposite meanings compared with their Portuguese colleagues. Where Portuguese em- ployers would like to have a system (from most favourite to least) that supports on the fly work practice, identifies training needs, identifies needs for workplace adaptations, supports with optimal employee pairing, reports health and condition of the worker and reports on progress, the majority of Danish employers finds these functionalities not very useful or not useful at all. Danish caregivers however, were more positive on SmartWork assets referring their higher preference for a system that provides information on health risks and monitors the status of the worker they care for.

SmartWork will be piloted in Portugal and Denmark for about 20 months with 72 older office workers, managers and caregivers. Results are expected early 2022.

References

• SmartWork, D2.2 - First version of co-design methodology, user requirements and use cases, 2019, GA 826343 (H2020-SC1-DTH-2018-1),

209 Healthy aging at work: Trends in scientific publications

Eftim Zdravevski1, Petre Lameski1, Ivan Chorbev1, Rossitza Goleva2, Nuno 3 3 3 1 Pombo , Nuno M. Garcia , Ivan Pires , Vladimir Trajkovik

1 Ss Cyril and Methodius University in Skopje, Faculty of Computer Science and Engineering, North Macedonia 2 New Bulgarian University, Department of Informatics, Sofia, Bulgaria 3 Universidade da Beira Interior, Portugal

[email protected], [email protected], [email protected]. mk, [email protected], [email protected], [email protected], impires@ it.ubi.pt, [email protected],

Keywords: healthy aging at work, active aging, older adults

210

For the increasingly aging populations in many developed countries healthy aging and wellbe- ing has become a major concern. Therefore, in this paper we apply NLP-based methods for automatic analysis of abstracts of scientific publications to identify trends in academic publi- shing. We analysed articles indexed in the IEEE Xplore, PubMed, and Springer digital libraries in accordance to the PRISMA surveying methodology adopted for scoping surveys. By iden- tifying keywords and properties relevant for this topic, we aimed to identify the papers that contain them and then to determine whether there are any general trends.

First, all three libraries were searched with the following keywords “successful aging at work”, “active aging at work”, “healthy aging at work”, “productive aging at work”, “technology for ac- tive aging at work”, “older adults at work”, and “successful aging in gerontology and life span psychology”. Figure 1 shows the distribution of identified potentially relevant papers based on these keywords.

Figure 1: Found papers with the searched keywords related topics trends per digital library

Then, we analysed the identified potentially relevant articles from the perspective of different topics: Proactive behaviour, Subjective and objective criteria for successful aging, Explana- tory mechanisms, Facilitating and constraining factors, Temporal patterns, Life style, Study types, Job characteristics, Medical conditions, Job problems and concerns, AI topics, and Smart applications. For each of these topics, several properties were identified and then we tagged articles whether they contain those topics or not.

211

Figure 2

There is a clear increasing trend in publications in almost all topics related to healthy aging at work.

Based on the initial results we observe that the increment in the publications per year is signifi- cant and that each year, the attention towards the problem of healthy aging by the scientific community is increasing. Furthermore, we observe that most of the publications discuss the facilitating and constraining factors in the working environments and the explanatory mechanisms in the environment. Of note, there is a lack of publications concerning smart applications that address healthy aging at work and this topic does not follow the general trend in publication growth.

References

• Zdravevski E. et al. (2019) Automation in Systematic, Scoping and Rapid Reviews by an NLP Toolkit: A Case Study in Enhanced Living Environments. In: Ganchev I., Garcia N., Dobre C., Mavromoustakis C., Goleva R. (eds) Enhanced Living Environments. Lecture Notes in Computer Science, vol 11369. Springer, Cham

Acknowledgments

The work presented in this paper is partially funded by the Ss. Cyril and Methodius University in Skopje, Faculty of Computer Science and Engineering, North Macedonia. We also acknowledge the support for publishing this paper from the SHELD-ON COST Action CA16226.

212 Challenges in building AI-based connected health services for older adults

1 2 3 Hrvoje Belani , Igor Ljubi , Kristina Drusany Starič

1 Ministry of Health, Zagreb, Croatia 2 VERN’ University of Applied Sciences, Zagreb, Croatia 3 University Medical Centre Ljubljana, Ljubljana, Slovenia

[email protected], [email protected], [email protected]

Keywords: artificial intelligence, connected health, data science, digital inclusive- ness, older adults

213

Artificial intelligence (AI) refers to systems manifesting intelligent behaviour by analysing their environment and taking actions, with some degree of autonomy, to achieve specific goals (COM 237 final. 2018). AI-based systems can be software-based, such as voice assistants, image analysis software, and face recognition systems, or embedded in hardware devices, such as medical robots, autonomous cars, and Internet of Things (IoT) applications. AI today is helping to solve some of the biggest global challenges, from treating chronic diseases or reducing fatality rates in traffic accidents to fighting climate change or anticipating cyberse- curity threats (COM 168 final. 2019).

Connected health (CH), as a new paradigm, manages individual and community health in a holistic manner by leveraging a variety of technologies and has the potential to incorporate telehealth and integrated care services, covering the whole spectrum of health-related ser- vices addressing both healthy subjects and chronic patients (Mountford et al. 2016). CH ser- vices here directly address patients and citizens at large in managing the disease or health and wellness, and the reorganization of which expects to bring high impact in the health care domain (Chouvarda et al. 2019).

The EU population will increase from 511 million in 2016 to 520 million in 2070, but the wor- king-age population (people aged 15-64) will decrease significantly from 333 million in 2016 to 292 million in 2070. These population structure projections reflect assumptions on fertility rates, life expectancy and migration flows. The old-age dependency ratio (people aged 65+ relative to those aged 15-64) in the EU will increase by 21.6 percentage points, from 29.6% in 2016 to 51.2% in 2070. This implies that the EU would go from having 3.3 working-age people for every person aged 65+ to only two working-age persons (COM ip 079. 2018).

Digital technologies and artificial intelligence will be vital tools in achieving universal health coverage (UHC), as expanding public and private sources of data and improving and ex- panding tools to analyse, visualize and model data increasingly allow researchers and po- licy-makers to identify patterns, problems and evidence for action (WHO, 2018). This work analyses the challenges in building AI-based connected health services for older adults in order to provide the satisfactory results towards achieving UHC.

The analysis addresses not only AI-related challenges, such as lack or imbalance of datasets and lack of domain knowledge about algorithms reliability, but also privacy and confidentia- lity risks, as well as ethical challenges. We address the challenges in building AI-based con- nected health services for older adults from the following perspectives given below (Table 1).

No. Challenge groups 1 Health-related: patient triage and screening options, treatment options, clinic workflow

2 Target user group-related: availability of training data, normalized acquisition parameters 3 AI-related: data gap and visualization, imbalance of datasets, data labelling and analysis

4 Development process-related: skill gap between software engineers and data scientists 5 Domain knowledge-related: lack of AI expertise, privacy and confidentiality risks, ethics

Table 1: Challenges in building AI-based connected health services for older adults

We approach the given challenges by applying Connected Health Impact Framework (Chou- varda et al. 2019) and analyzing them through the following axes: service description, user, outcome, and value proposition. Being oriented more to the older adults, we look deeper

214 into barriers and enablers for building such AI-based connected health services. Finally, we map the challenges with the wearable technology options (Loncar-Turukalo et al. 2019) that can make connected health services for the older adults more effective and efficient, by fit- ting into their life surroundings better. As healthcare seems to be one of the most attracted areas for applying AI (Jiang et al. 2017), we address the opportunities as well as obstacles to its current usage.

References

• Chouvarda I., Maramis C., Livitckaia K., Trajkovik V., Burmaoglu S., Belani H., Kool J., Lewandowski R., Connected Health Services: Framework for an Impact Assessment. J Med Internet Res 2019;21(9):e14005, DOI: 10.2196/14005, PMID: 31482857 • European Commission: 2018 Ageing Report, institutional paper 079, ISBN 978- 92-79-77460-7, Brussels, May 2018. Retrieved on August 31, 2019 • European Commission: Artificial Intelligence for Europe, COM (2018) 237 final, Brussels, 25.4.2018. Retrieved on August 31, 2019 • European Commission: Building Trust in Human-Centric Artificial Intelligence, COM (2019) 168 final, Brussels, 8.4.2019. Retrieved on August 31, 2019 • Jiang F., Jiang Y., Zhi H. et al, Artificial intelligence in healthcare: past, present and future, Stroke and Vascular Neurology 2017;2: e000101. doi:10.1136/svn-2017-00010 • Loncar-Turukalo T., Zdravevski E., Machado da Silva J., Chouvarda I., Trajkovik V., Literature on Wearable Technology for Connected Health: Scoping Review of Research Trends, Advances, and Barriers. J Med Internet Res 2019;21(9):e14017, DOI: 10.2196/14017, PMID: 31489843 • Mountford N., Kessie T. et al. 2016. Connected Health in Europe: Where are we today? Technical Report. University College Dublin, Ireland. ISBN 9781910963050. Retrieved August 11, 2019 • World Health Organization: Big data and artificial intelligence for achieving universal health coverage: an international consultation on ethics. Geneva, 2018 (WHO/HMM/IER/REK/2018.2). Licence: CC BY-NC-SA 3.0 IGO

215 Connected health and wellbeing for people with complex communi- cation needs

1 2 3 Hrvoje Belani , Kristina Drusany Starič , Igor Ljubi

1 Ministry of Health, Zagreb, Croatia 2 University Medical Centre Ljubljana, Ljubljana, Slovenia 3 VERN’ University of Applied Sciences, Zagreb, Croatia

[email protected], [email protected], [email protected]

Keywords: complex communication needs, connected health, wellbeing, assistive technologies, digital inclusiveness

216

World Health Organization (WHO) defines universal health coverage (UHC) as its priority objec- tive in ensuring that all people have access to needed health services, including prevention, promotion, treatment, rehabilitation and palliation, of sufficient quality to be effective while also ensuring that the use of these services does not expose the user the financial hardship. WHO also reported (WHO 2016) that more than 50% of its member states have an e-health strategy, and 90% of e-health strategies reference the objectives of UHC or its key elements.

The term “e-health” is defined as the transfer of health resources and health care by electronic means. It encompasses three main areas: (1) the delivery of health information, for health professionals and health consumers, through the Internet and telecommunications; (2) using the power of IT and e-commerce to improve public health services, e.g. through the educa- tion and training of health workers; (3) the use of e-commerce and e-business practices in health systems management (WHO 2016). With more involvement of the electronic systems and information and communication technology (ICT) in offering health services, the major international organizations, such as European Union (EU), International Telecommunication Union (ITU) and European Space Agency (ESA) have officially adopted the denomination “e- health”. It refers to the use of modern information and communication technologies to meet the needs of citizens, patients, healthcare professionals, healthcare providers, as well as policy makers (EUMD 2003).

Connected health is a paradigm shift looking after the individual and community health in a pro- cess that speaks to the health journey of the person through the entire lifespan leveraging a variety of technologies to do so (Mountford et al. 2016). It has been coined in recent years to encompass various terms describing different advances in health care enabled by ICTs, such as e-health, digital health, m-health, tele health, telemedicine, remote care, and assis- ted living. The new concept of health management includes devices (such as sensors and wearables) and services, as well as interactions (e.g. interventions) designed with a patient in the centre.

We consider the concept of connected health and wellbeing (CHW) as an upgrade that deals with person’s journey through life not only considering health, but also wellbeing aspects, such as positive relations with others, autonomy and self-acceptance. People with complex communication needs (CCN) are not fully able to deal with everyday communication situa- tions by using only speech. They have trouble with producing or understanding speech, which can also involve difficulties with reading and writing. Nevertheless, they also need to learn and utilize the technologies that empower them and enable the health and wellbeing status of every individual is fully accepted, along with his/her capabilities (Belani et al. 2016). This work analyses the aspects and challenges of connected health and wellbeing for people with complex communication needs, especially targeting the older adults. The addressed research questions on CHW for people with CCN are given below (Table 1).

217

No. Research questions 1 Which are the aspects of wellbeing? How they relate to health? What does it mean to older adults?

2 What is the concept of connected health and how to apply outside the health domain? 3 How are people with complex communication needs achieving their wellbeing? Which criteria have to be met?

4 How does connected health and wellbeing technologies help people with complex communication needs achieve digital inclusiveness? What accessibility and usability requirements have to be met?

Table 1: Research questions on connected health and wellbeing for people with CCN

In order to tackle these questions, we examine the role of ICTs and connected health techno- logies in achieving digital inclusiveness, in order to promote inclusion and equality in society for people with CCN and older adults who need support in understanding, communicating and learning how to achieve and maintain their health and wellbeing. We aim to learn from the area of clinical practice called augmentative and alternative communication (AAC), in order to build solutions that enable connected health and wellbeing environment for people with CCN (Vučak et al. 2012).

References

• Belani H., Car Ž., Vuković M. 2016. Augmentative Requirements Engineering: Getting Closer to Sensitive User’s Needs. In: Usability- and Accessibility-Focused Requirements Engineering. Zürich, Switzerland: Springer International Publishing: 97-116 • EU Ministerial Declaration, eHealth 2003, High Level Conference, Brussels 22 May 2003. Retrieved August 11, 2019: • Mountford N., Kessie T. et al. 2016. Connected Health in Europe: Where are we today? Technical Report. University College Dublin, Ireland. ISBN 9781910963050. Retrieved August 11, 2019: • Vučak I., Belani H., Vuković M. 2012. AAC Services Development: From Usability Requirements to the Reusable Components. In: Jezic G., Kusek M., Nguyen NT., Howlett R.J., Jain L.C. (eds) Agent and Multi-Agent Systems. Technologies and Applications. KES-AMSTA 2012. Lecture Notes in Computer Science, vol 7327. Springer, Berlin, Heidelberg: 231-240 • World Health Organization (2016). Global diffusion of eHealth: making universal health coverage achievable. Report of the third global survey on eHealth, ISBN 978-92-4-151178-0

218 Analysis on competences and needs of senior citizens related to domotics

1 2 Tamara Suttner , Paul Held

Innovation in Learning Institute, University Erlangen-Nuremberg, Germany

[email protected], [email protected]

Keywords: senior learners, smart home technologies, ICT based learning, problem-based learning, empowerment

SmartyourHome (Ref. 2018-1-DE02-KA204-005182) has been funded with support from the European Commission. This publication reflects the views only of the authors, and the Commission cannot be held responsible for any use which may be made of the information contained therein.

219

Introduction: The European SmartyourHome project aims to enable older adults to understand the characteristics and possibilities of digitalization linked with smart-home concepts and to make active use of them. The project started with a competences & needs analysis of the target group regarding the topic of smart home, the results will be reported here. Method: Collecting data followed two strands: First strand, an online-survey conducted with 176 ol- der adults from Germany, Romania, Spain, Italy and Ireland (Table 1). It included 19 questions (free text or yes/no). The survey period covered 4 weeks. Second strand: Focus Groups were carried out in the same countries. The Focus Group method was intended to provide further in-depth insights in order to widen the perspective by qualitative means. Using this method allowed to make personal and direct contact with the seniors and to obtain a richer picture of the older adults’ feelings and attitudes on the issue of smart home. Sample: Since, to date, older adults just come in contact with smart home technologies, the project obviously deals with early adopters. So, the sample is limited to older adults with affinity towards technology. The results of this pilot study cannot be transferred to older people in general. Recruitment, both the survey and the focus group, took place via networks of the partners covering orga- nisations that deal with open minded older adults.

Items Germany Romania Spain Italy Ireland Total Respondents 24 23 80 29 20 176 Rdvanced/ experts 18 6 28 7 16 75 Under 60 years 14 14 10 4 14 56 60-70 years 8 7 37 11 3 66 Over 70 years 1 2 33 13 3 52 Female 7 9 45 13 13 87 Male 16 14 35 15 7 87

Table 1: Target group characteristics

Across all countries, not only men took part in this technology related survey, but in Spain and Ireland even more women than men. The interviewees’ age range was 50 to 90 years (see table 1), with almost all of them living in rented flats or their own houses, primarily in urban areas or conurbations. Results: Most of the respondents consider their ICT skills to be relati- vely low, but there are again two different groups: in Germany, Ireland and Spain the percen- tage of older adults who have no experience with computers, tablets or smartphones is low (below 5%), there are beginners, advanced and even “experts” among them. In Romania and Italy, on the other hand, the number of older adults years without ICT skills is much higher (around 20%), with most of them being beginners, few advanced and no experts. This means that in these countries the knowledge about smart home is still very low, smart devices are nearly not used by older adults. For this reason, the participants in Romania, Italy and even Ireland skipped the more specific questions on the concrete use of smart home applications. Only Spain and Germany have provided answers here. Some smart home devices are there already partly in use, but the programming and use of so-called micro-controllers is no is- sue yet. Although many respondents in the partner countries have low ICT knowledge (see Schreurs et al. 2017), interest in smart home is relatively high; however, in some countries very few can imagine participating in the project further. The lack of initial ICT skills prevents a deeper involvement with new technologies. The interests of older adults in smart home application sectors can be summarized as follows (table 2):

220

Applications Germany Romania Spain Italy Ireland Total Automation/ comfort 4 5 5 4 4 22 Home security 3 2 3 1 3 12 Health 5 1 2 2 5 15 Energy 1 4 1 3 1 10 Entertainment/ 3 4 5 2 16 communication 2

Table 2: Interest in smart home applications (priority: 1 =most interest; 5= least interest)

Smart services related to energy seem to raise most interest whereas home automation and comfort are ranked low. Of course, there are also concerns about these services: data pro- tection/ hacking/privacy, complexity, high costs, doubts about utility and usability are the most frequent. Nevertheless, participants would be willing to pay for smart home services (100-3000€) (see AAL study, 2017). The above results guide the course creation, a mix of learning approaches will be applied (face-to-face, online course, self-study, hands-on) with up to four learning hours per week. Participants from Germany, Ireland and Spain would be happy to act as eTutors.

Following the results of the study, guiding principles for the course concept will be: Easy hand- ling & operation of the applications; explanations and instructions of Smart-Home issues on a basic level; self-assessing the usefulness of smart home devices; reduce distrust & fears towards Smart Home; provide assistance and contacts; starting on a very basic level with softly increasing difficulty. Although the majority of participants were relatively unfamiliar with smart-home concepts, they were able to point out benefits and concerns regarding this te- chnology. There is a perceived openness from both, survey and focus group participants to further engage with smart home technology. This is a promising base for the course offer to be developed in the project.

References

• AAL-Studie (2017): Ambient Assisted Living: Zuhause 4.0 statt Altersheim. Alexander Wild (Hrsg.): Feierabend.de access: 06.05.2019. • Schreurs, K. Quan-Haase, A. Martin, K. (2016). Problematizing the digital literacy paradox in the context of older adults’ ICT use: aging, media, discourse and self- determination. Canadian Journal of Communication, 42(2), 359-377.

221 Comparison of technology adoption models

Agnieszka Landowska

Gdansk University of Technology, Faculty of Electronics, Telecommunications and Informatics, Poland [email protected]

Keywords: technology adoption models, acceptance of technology

222

The number of technological solutions for the older adults proposed and available on the mar- ket is growing. However, some solutions are not widely applied and do not reach the market goals. Moreover, all technological advancements are accepted by some users and rejected by others. There are several technology adoption models, that try to explain, how and why the technologies are adopted and used. Among those, that are widely used to explain, how the older adults accept technologies, there are some general models and models specific to the group of older users. Among the general ones I would recommend paying attention to the following models: Technology Acceptance Model (TAM) proposed by Davis (Davis, 1989), Extended Technology Acceptance Model (TAM2), Extended (contextual) Technolo- gy Acceptance Model (TAM3) and Unified Theory of Acceptance and Use of Technology (UTAUT). Among the ones specific for the group of older users there are: Senior Technology Acceptance Model (STAM), Extended Technology Acceptance Model for the Elderly (ETA- ME), Model of Technology Adoption by Older Adults (MATOA) and Elderadopt. The models were identified with the literature review held by SHELD-ON participants (Sheld-on, 2019). This paper provides a brief comparison of the models.

Common concepts

Within the models there is a set of common basic concepts, that represent, although someti- mes named differently, the same variable (attitude or behaviour). The basic concepts inclu- de: perceived usefulness, perceived ease of use, attitude towards using, behavioural inten- tion to use and actual system use. The concepts coverage within the models is presented in table 1.

Concept TAM TAM2 TAM3 UTAUT STAM ETAME MATOA Elderadopt

a b Perceived usefulness x x x x x x

d Perceived ease of use x x x xc x x Attitude towards using x x * Behavioural intention to use x x x x x x * Actual system use x x x x * External variables xe xf xf xf xf xf xf Xf

Table 1: Basic common concepts in technology adoption models acalled Performance expectancy, bcalled Appraised Efficaciousness, ccalled Effort expectancy, dcalled Appraised Usability, egeneral concept of “external variables”, fmodel-specific list of variables, *a different set of concepts was proposed that are not directly equivalent to the basic ones, although cover them as a set The most important notice about the basic concepts is the distinction between the attitude towards using, behavioural intention to use and the actual system use. The developers of the technologies frequently miss the difference and while measuring “attitude towards using” concept only, they make prediction on sales and are surprised after the product is launched.

In addition to the basic concepts, each model defines a list of external variables, partially explai- ning variability of acceptance among users and technologies. The external variables genera- lly fall into the following categories: personal characteristics of a user (age, gender, condition, experience with technology etc.), technology features (i.e. system quality, task structure) or contextual characteristics (social support, voluntariness of use, etc.). The lists of external va- riables proposed by different models are provided in Table 2.

223

Model External variables influencing adoption of technologies TAM only a general concept of "external variables"

TAM2 voluntariness, subjective norm, experience, image, job relevance, output quality, result de- monstrability TAM3 TAM2 list + others use, system quality, organizational support, prior experience, anxiety, task structure, computer efficacy UTAUT voluntariness, experience, social influence, facilitating conditions, gender, age STAM social influence, gender, age, education, income, health satisfaction, movement ability ETAME experience, gender, age, narrative, social, physical MATOA anxiety, social influence, gender, age, education, physical, requisite knowledge, social support, self-concepts, self-management, self-compensation Elderadopt perceived stressfulness of individual's unmet needs, individual resilience, persuasiveness of external information, persuasiveness of internal information

Table 2: External variables in technology adoption models Note: each of the variables explains only a small portion of variance in the intention of use and the consumer behaviours

Elderadopt model, that is an adoption of psychological theory of appraisal to adoption of te- chnologies by the older adults, is unique in the following issues: (1) it starts not with the technology itself, but with the individual appraisal of unmet needs (a problem); (2) it takes into account alternative solutions (traditional coping solutions); (3) it shows that a user might perform action strategies (choosing to adopt the proposed or an alternative solution) or mind strategies (change goals, change needs assessment, resignation or denial). The use of the appraisal theory makes the model unique and perhaps the one to use if a context of techno- logy is to be analysed.

Summing up, the comparison of the models brings some key advice. Firstly, distinguish be- tween the concepts of the attitude towards using, behavioural intention to use and the ac-

tual system use and be aware, which one you are actually measuring. Secondly, include a reasonable list of external variables, perhaps mixing models, if needed. Thirdly, if analysing problem and context of use and not specific

References

• SHELD-ON, 2019. State of the Art Report for Smart Habitat for Older Persons

Acknowledgments

This work is supported by COST Action CA16226Indoor living space improvement: Smart Habitat for the Elderly.

224 Choice architecture to promote the adoption of new technologies

Dean Lipovac, Michael D. Burnard

InnoRenew CoE, 6310 Izola, Slovenia; University of Primorska, Andrej Marušič Institute, Koper, Slovenia [email protected], [email protected]

Keywords: technology adoption, promotion, choice architecture, status quo bias

225

It is difficult to predict to what extent and how fast new technologies will be adopted. Amongst older adults, we know technology adoption lags behind many other demographics (Lee and Coughlin 2015). This poses significant barriers in delivering solutions to a user group that can benefit significantly from recent advances in robotics, artificial intelligence, Internet of Things, and other digital assisted living technologies.

Technology adoption is limited due to many factors, including privacy and security concerns, as many of the solutions monitor users (i.e., for fall detection or to predict health events) and therefore mean collecting and sharing personal data (Chaudhuri et al. 2014). Issues around stigma, access to support, and anxiety about using devices are barriers as well (Chaudhuri et al. 2014). Solutions for older adults may be further limited because product designers (typically younger, technologically capable adults) may not be closely familiar with the cha- racteristics and needs of the intended users (older adults) (Lee and Coughlin 2015). Recent strides in requirements engineering may help bridge the gap between the needs perceived by designers and the actual physical and emotional needs of the users (Taveter et al. 2012; Lopez-lorca et al. 2014). If user needs are addressed well by a solution, what can be done to improve its uptake?

One of the major obstacles stalling technology acceptance across all demographics stems from human cognitive biases. Particularly responsible is the status quo bias, where the current state of affairs is preferred and any change from that state is perceived as a loss (Samuel- son and Zeckhauser 1988). Users may therefore resist adopting the new technology even if it can address their needs (Kim and Kankanhalli 2009). In healthcare facilities, for example, potential users sometimes resist adopting technologies that could considerably improve organizational functioning (Bhattacherjee and Hikmet 2007). Widespread implementation of assistive technologies for older adults may face similar adoption resistance. For instance, it has been observed that resistance to change in older adults is associated with lower wi- llingness to adopt preventative mobile health services (Hoque and Sorwar 2017). Promoting the use of technology can thus provide only limited gains if it is based only on technological aspects and does not consider user resistance fuelled by the status quo bias.

A recent study has demonstrated how a slight manipulation in the choice architecture – the way choices are presented – can alleviate such user resistance and nudge people to more frequently favour the new technology (Stryja et al. 2017). Thoughtful design of choice archi- tecture thus has the potential to decrease the older adults’ resistance to adopting helpful assistive technologies. The presentation of choices could be improved with greater unders- tanding of the general human nature and the specific characteristics of the target users. The insights elicited during the requirements engineering phase could not only be useful in de- signing products but also in designing choices that the intended users will face.

Our analysis of choice architecture design will include the strengths, weaknesses, opportuni- ties, and threats (SWOT). Carefully structured choices can provide guidance in decision-ma- king process, but it may not be easy to design them. And while devising choices offers many opportunities for positive change, it is important to be mindful of the potential risks when the choice architecture does not act in the best interest of users.

In the world of behavioural economics, it has long been recognized that humans are rarely ra- tional agents, whose decision-making hinges on careful analysis of the available evidence. Human biases have long been exploited in modern advertising that often encourages activi- ties that are harmful to health. Similar approaches could be applied to promote health bene-

226 fiting activities and tools, including the use of assistive technology, which must be adopted more widely to achieve the greatest impact. One valuable tool may be carefully designed choice architecture that accompanies a technological solution in the promotion phase.

References

• Bhattacherjee A., Hikmet N. 2007. Physicians’ resistance toward healthcare information technology: A theoretical model and empirical test. European Journal of Information Systems, 16: 725–737 • Chaudhuri S., Thompson H., Demiris G. 2014. Fall detection devices and their use with older adults: A systematic review. Journal of Geriatric Physical Therapy, 37, 4: 178–196 • Hoque R., Sorwar G. 2017. Understanding factors influencing the adoption of mHealth by the elderly: An extension of the UTAUT model. International Journal of Medical Informatics, 101, 75–84 • Kim H., Kankanhalli A. 2009. Investigating user resistance to information systems implementation: A status quo bias perspective. MIS Quarterly, 33, 3: 567–582 • Lee C., Coughlin J. F. 2015. Older adults’ adoption of technology: An integrated approach to identifying determinants and barriers. Journal of Product Innovation Management, 32, 5: 747–759 • Lopez-Lorca A. A., Miller T., Pedell S., Sterling L., Curumsing M. K. 2014. Modelling emotional requirements. • Stryja C., Satzger G., Dorner V. 2017. A decision support system design to overcome resistance towards sustainable innovations. European Conference on Information Systems Proceedings: 2885–2895 • Taveter K., Du H., Huhns M. N. 2012. Engineering societal information systems by agent- oriented modelling. Journal of Ambient Intelligence and Smart Environments, 4, 3: 227–252 • Venkatesh V., Thong J. Y. L., Xu X. 2012. Consumer acceptance and use of information technology: Extending the unified theory of acceptance and use of technology. Management Information Systems Quarterly, 36, 1: 157–178

Acknowledgments

The authors gratefully acknowledge the European Commission for funding the InnoRenew CoE project (Grant Agreement #739574) under the H2020 Widespread- Teaming programme and the Republic of Slovenia for funds from the European Regional Development Fund. improvement: Smart Habitat for the Elderly.

227 Swallowing the bitter pill: Emotional barriers to the adoption of medicine compliance systems among older adults

Vered Shay, Kuldar Taveter, Michal Isaacson

Department of Gerontology, Haifa University, Haifa, Israel [email protected], [email protected], [email protected]

Keywords: medicine compliance systems, older adults, requirements elicitation, mo- tivational modelling, emotional requirements

228

Medications are the primary tools used to prevent and effectively manage chronic conditions. However, despite their importance and benefits, appropriate medication usage remains a challenge for both patients and healthcare providers. Patients frequently do not adhere to essential medications, resulting in poor clinical outcomes and increased cost of care. The total cost estimate for non-adherence ranges from 100-300 billion USD each year in the USA alone, and includes both direct and indirect costs.

Non-adherence also increases the burden imposed on informal family caregivers. Although prevalent in all age groups, the challenge is much more complex when it comes to the el- derly population, who suffer from higher rates of chronic conditions.

Different systems have the potential to assist with medicine compliance by organizing medica- tions, reminding patients to take their medications and even by communicating compliance statuses to medical practitioners or caregivers. These systems range from simple pill orga- nizing boxes to fancy systems based on sophisticated technologies. While many solutions exist, medicine compliance remains a problem for many people. Adoption rates of medicine compliance systems are low and initial enthusiasm is frequently followed by declining usage as time passes.

Appropriation of a medicine compliance system is driven by a variety of emotions, depending on user status, whether they are actually using the system, or on the phase of the deci- sion-making process about adoption of the system.

A medicine compliance system that is imposed on a user by, for example, informal caregivers, is not accepted positively unless there is a wide spectrum of positive emotions that em- power the decision-making process by the user. Emotions such as assurance, sense of free- dom, anticipation, obligation, and compulsion impact the decision-making process about the adoption of a medicine compliance system. Therefore emotional requirements should be explicitly considered during system design and implementation to encourage appropria- tion and avoid rejection of systems.

We have examined behavioural, cognitive and emotional barriers to using medicine complian- ce systems for trying to understand why medicine compliance systems have not been the success that they were initially expected to be. For holistic consideration of different as- pects of using medicine compliance systems, including emotional ones, we propose to elicit and represent requirements for medicine compliance systems in terms of three types of goals: hierarchy of functional goals or ‘Do’ goals and attached to them quality (non-functio- nal) goals or ‘Be’ goals and emotional goals or ‘Feel’ goals (Miller, Pedell, Sterling, Vetere, & Howard, 2012; Lorca, Burrows, & Sterling, 2018; Sterling, & Taveter, 2009). Eliciting and repre- senting emotional requirements as first-class citizens has been proposed by Miller, Pedell, Lopez-Lorca, Mendoza, Sterling, & Keirnan (2015). The notation for ‘Do’, ‘Be’ and ‘Feel’ goals is depicted in Figure 1.

Figure 1: Notation for ‘Do’, ‘Be’ and ‘Feel’ goals

229

Figure 2 shows a part of the results of the requirements elicitation sessions for an “ideal” me- dicine compliance system that were conducted by the authors of this paper based on an extensive literature review. The goal model represented in Figure 2 and other goal models created will be explained in the presentation.

Figure 2: The results of applying the ‘Do’/’Be’/’Feel’ method tomedicine compliance systems based on the literature review

References

• Lorca, A. L., Burrows, R., Sterling, L. 2018. Teaching motivational models in agile requirements engineering. In 2018 IEEE 8th International Workshop on Requirements Engineering Education and Training (REET) (pp. 30-39). IEEE. • Miller, T., Pedell, S., Lopez-Lorca, A. A., Mendoza, A., Sterling, L., & Keirnan, A. (2015). Emotion-led modelling for people- oriented requirements engineering: The case study of emergency systems. Journal of Systems and Software, 105, 54-71. • Miller, T., Pedell, S., Sterling, L., Vetere, F., Howard, S. 2012. Understanding socially oriented roles and goals through motivational modelling. Journal of Systems and Software, 85(9), 2160-2170. • Sterling, L., Taveter, K. 2009. The art of agent-oriented modeling. MIT Press.

Acknowledgments

We express out gratitude to the COST Action CA16226 “Indoor living space improvement: Smart Habitat fo the Elderly” that has funded the Short Term Scientific Mission, in the course of which the work reported in this paper was performed.

230 An edge computing robot experience for automatic elderly mental health care based on voice

1 2 2 2 C. Yvanoff-Frenchin , V. Ramos , T. Belabed , C. Valderrama

1 ESIEE, France 2 UMons, Belgium

[email protected], [email protected], [email protected], [email protected]

Keywords: edge computing, mental health care, voice-enabled techniques, embedded systems

231

Mental health care and diagnosis are today migrating towards mobile solutions (Bakker et al. 2016). The interest and techniques were already under study considering multiple aspects, from acceptability to clinical efficacy, through targeted therapies and clinical benefits (Simon and Ludman 2009). Mobile applications provide a more accessible support (Watts and An- drews 2014). This becomes particularly interesting, knowing that people dealing with mood, stress, or anxiety not always seek professional help or get care when it is really needed (Mojtabai et al. 2002). On the other hand, care or help is not always available when needed, for reasons such as location, financial averages or for societal reasons (Collin et al. 2011). We need open platforms driven by specialists, in which queries can be created and collected for long periods and the diagnosis made, based on a rigorous clinical follow-up. In this work, we developed a multi-language robot edge computing embedded interface helping to evalua- te the mental health by interacting through questions.

Voice-enabled technologies are leading multiple domains, integrated in products that are part of our everyday life, from cars to home automation (Brownstein et al. n.d.). Although commercial products such as Microsoft Cortana, Amazon, Google Home or Sonos are not always open to customization, they are supported by development platforms, as in the case of Amazon AWS (AWS 2019), Google (Cloud 2019) and IBM Watson (IBM.Watson 2019), for example. According to (Van Der Straten, n.d.), the healthcare sector is the most popular ca- tegory (47.1%) for vertical voice-based applications. Telemedicine encourages conversation applications in the health field, particularly where hospitals have a strong incentive to provi- de high quality follow-up care. Thus, the high cost of physicians and caregivers is spent on hours of data collection in electronic health records. The voice health sector also extends to seniors who wish to stay at home, especially those who refuse mobile or smart techno- logies requiring dexterity or good vision (Brownstein, Lannon, and Lindenauer n.d.). Aging at home implies socializing, AI-based activity oriented interfaces and daily monitoring services. Robot-based, patient-caregiver, communication saves time and therefore increases the pro- ductivity of already planned tasks such as remainders and appointments. Physician notes, such as the Electronic Health Record (EHR) and patient feedback, now use voice technology and AI-based natural language scribes (Kiroku 2019) on multiple platforms (PC, smartpho- nes), including new microphones and wearable voice interfaces (Notable 2019).

232

Figure 1: Healthcare home hands-free voice device system architecture. The edge-computing embedded system is composed of three sections: Jupyter WEB interface, Programmable Software PS and Programmable Logic PL. The PL part uses real-time audio procession interfaces for user interaction using a headset. Jupyter allows to recover results in text and graphical form. The first task of the system is to recognize the user language. The second steps to process the questions list that is used to fill the HER collector and database with the user answers.

Healthcare voice-enabled mobile applications such as MIMOSYS (Medical-pst 2019) and CHADmon (Antosik-Wójcinska et al. 2019) monitor mental health or disorders related to emo- tional changes through the analysis of the human voice. Several healthcare platforms are working with Amazon Alexa and Google Assistant smart speakers, for instance, Cuida Health LISA (CuidaHealth 2019). Many are actually hands-free voice devices, such as Rosie Remin- der (Smpltec 2019) and ElliQ (Elliq 2019), others are wearable, as Notable (Notable 2019). In this work, we developed a multi-language robot interface helping to evaluate the mental health by interacting through questions. The specialist can propose questions, as well as to receive users’ answers, in text form. The robot can automatically interact with the user using the appropriate language. It can process the answers and under the guidance of a specialist, questions and answers can be oriented towards the desired therapy direction. To build a hands-free voice device, we target an edge computing embedded system (Fig. 2). The pro- totype, was implemented on an embedded device, a Xilinx PYNQ-Z1 board (Xilinx 2019), designed to be used as an open-source framework anywhere at home. That system combi- nes hardware libraries for audio processing and a Python program running on an ARM A9 CPU. The software part is programmed using Python libraries from Google Cloud (Cloud 2019) (translate) and IBM Watson (IBM.Watson 2019) (speech, empathy and natural langua- ge), all in a Jupyter Notebook development environment. The platform can be accessed through a Web server hosting the Jupyter Notebooks design environment that includes the IPython kernel and packages running on a Linux OS. The experience is now available for specialists to create queries and answers through a Web-based interface. Queries can be created and collected for long periods and the diagnosis made, based on a rigorous clinical follow-up.

233 References

• Antosik-Wójcinska, A.Z., M. Chojnacka, M. Dominiak, and Ł. Święcicki. 2019. “The Use of Smartphones in the Management of Bipolar Disorder- Mobile Apps and Voice Analysis in Monitoring of Mental State and Phase Change Detection.” European Neuropsychopharmacology 29: S528–29. • AWS. 2019. “Amazon Polly.” 2019. • Bakker, David, Nikolaos Kazantzis, Debra Rickwood, and Nikki Rickard. 2016. “Mental Health Smartphone Apps: Review and Evidence-Based Recommendations for Future Developments.” JMIR Mental Health 3 (1): e7. • Brownstein, John, Jennifer Lannon, and Sarah Lindenauer. n.d. “37 Startups Building Voice Applications for Healthcare.” MobiHealthNews. Accessed August 31, 2019. • Cloud, Google. 2019. “Google Cloud: Cloud Translation : Translation Client Libraries.” 2019. . • Collin, Philippa J, Atari T Metcalf, Justine C Stephens-Reicher, Michelle E Blanchard, Helen E Herrman, Kitty Rahilly, and Jane M Burns. 2011. “ReachOut.Com: The Role of an Online Service for Promoting Help-Seeking in Young People.” Advances in Mental Health 10 (1): 39–51. • CuidaHealth. 2019. “You Can Make It Fun For Older Adults.” 2019. • Elliq. 2019. “ElliQ, the Sidekick for Happier Aging – Intuition Robotics.” 2019. • IBM.Watson. 2019. “IBM Watson Products and Services.” 2019. • Kiroku. 2019. “Kiroku: Automated Clinical Record Keeping.” 2019. • Medical-pst. 2019. “MIMOSYS: Voice Analysis of Pathophysiology.” 2019. • Mojtabai, R., M. Olfson, and D. Mechanic. 2002. “Perceived Need and Help-Seeking in Adults With Mood, Anxiety, or Substance Use Disorders.” Archives of General Psychiatry 59 (1): 77. • Notable. 2019. “Notable: Autopilot for Health Care.” 2019. • Simon, Gregory E, and Evette J Ludman. 2009. “It’s Time for Disruptive Innovation in Psychotherapy.” The Lancet 374 (9690): 594–95. • Smpltec. 2019. “Reminder Rosie: SMPL Technology.” 2019. • Straten, Savina Van Der. n.d. “Voice Tech Landscape: 150+ Infrastructure, Horizontal and Vertical Startups Mapped and Analysed.” Medium. • Watts, Sarah E., and Gavin Andrews. 2014. “Internet Access Is NOT Restricted Globally to High Income Countries: So Why Are Evidenced Based Prevention and Treatment Programs for Mental Disorders so Rare?” Asian Journal of Psychiatry 10 (August): 71–74. • Xilinx. 2019. “PYNQ: Python Productivity for Zynq.” 2019.

Acknowledgments

The authors would also like to acknowledge the contribution of the COST Action CA16226 Indoor living space improvement: Smart Habitat for the Elderly.

234

EMBnet Workshop

Big Data analysis and reproducibility: new challenges and needs of the post-genomics era

October 18, 2019 Ohrid (Republic of North Macedonia)

Programme Committee Domenica D’Elia Erik Bongcam-Rudloff Dimitrios Vlachakis Sissy Efthimiadou Eleni Papakonstantinou

Maja Zagorščak

ICT Innovations 2019 Web Proceedings ISSN 1857-7288

Towards Integration Exposome Data and Personal Health Records in the Age of IoT

Savoska Snezana [0000-0002-0539-1771], Blagoj Ristevski [0000-0002-8356-1203],

Natasha Blazheska –Tabakovska [0000-0002-8715-3119] and Ilija Jolevski [0000-0003-2262-9638] “St. Kliment Ohridski” University - Bitola, Faculty of Information and Communication Tech- nologies - Bitola, ul. Partizanska bb 7000, RN Macedonia [email protected], [email protected], [email protected], [email protected]

Abstract. Nowadays we are facing with data flood in many areas. Big data come from numerous sources such as human activities, measuring instruments and many appliances connected to computers or smart phones. One of the most challenging topics in the next decade will be how combination of genome and exposome data will contribute to reveal the risks for particular diseases. Ac- cording to the medical scientists, the exposome includes all exposure environ- mental factors, from chemical and nonchemical agents to socio-behavioral and psychological factors as stress, diet, endogenous and exogenous factors from whole lifespan. The growing of mobile and ubiquitous computing technologies contributes in increasing the number of records regarding personal health and habits of patients. Internet of Things (IoT) includes the development of weara- ble measurement sensors connected with Bluetooth, which are capable to cap- ture and store health-related data, intended to be stored in patient health records. The exposome is a healthcare and medicine concept that implies an interdisci- plinary and integrated approach of many sciences domains including epidemi- ology, computing, environment sciences, toxicology and social science. We aim to integrate the data collected from various sensors and detectors in the patient health record to provide clinicians with more elements for better disease prog- nosis, diagnosis and treatment.

Keywords: Exposome × Omics Data × Personal Health Records × Health Infor- matics × Internet of Things.

1 Introduction

One of the emerging areas where big data can give great results are healthcare and medicine that use computer science as a tool for gaining emerging insights and analy- sis from a wealth of collected personal health data, medical and various omics data. With a concept of precision medicine and creating digital records for healthcare in patients’ everyday life, healthcare organizations have adopted computer science and information systems as tools to collect and analyze data [14]. With technology ad- vances and human genome sequencing, they have started to use more and more data

237 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

that are stored in huge repositories in many places and in various formats, using a wide range of applications, medical and measuring instruments. When Christopher P. Wilde explained that genome theory does not elucidate many chronicle diseases and proposed using of environmental complement to the genomics data, it was just a beginning of gaining different sets of omics data. These data sets were defined as exposome, as whole of exposures throughout human lifespan [23]. They had to be used with genome data to determine risk of diseases. Because data collected in all these manners have different formats, types, attributes, usually these data are analyzed by considering 6 V’s big data characteristics: volume, velocity, variety, value, variability and veracity [17]. Hence, they demand attention of scientists from wide ranges of disciplines with purpose to be analyzed in a suitable manner to gain useful insights. One of the biggest challenges in the next decade will be how combination of ge- nome and exposome data (as a whole of human exposures from birth to death) will contribute to reveal the risk factors for particular disease. According to the medical scientists, the exposome includes exposure environmental factors from chemical and non-chemical agents, socio-behavioral and psychological factors such as stress, diet, endogenous and exogenous factors from whole lifespan [4]. Christopher P. Wild [23] defined three scopes of exposome: general (as social capital, education), internal (as metabolism, gene expression) and specific external (as chemical, noise). The rest of the paper is organized as follow. In Section 2 we review some aspects related to PHR (personal health record) and EHR (electronic health record). In Sec- tion 3 we describe the concept of exposome. In Section 4 we highlighted Internet of Things (IoT) concept used as sensors for feature measurement as exposome. In Sec- tion 5 we explained the proposed relation between PHR and exposome. Finally, in Section 6 we provide concluding remarks.

2 Personal Health Records and Electronic Health Records

Several challenging areas today include relationships among healthcare and e-health, evidence based medicine, telemedicine and pervasive medicine [9]. We are aware that in the last decade population migration and peoples traveling increasing over the time. People as patients change their need for healthcare services toward using of electronic health services, which provides a medical care from everywhere. There is an arising need of medical facts for patients to help medical practitioners to make best decisions about diagnoses, treatments and patient care [1, 3], as evidence based medicine. Data can be obtained from many measuring instruments, prescribed documents and medi- cal assessments. The best solution to support these new trends is collecting healthcare data from all patients’ healthcare activities, medical examinations, laboratory results, prescription and referral, all stored in healthcare digital repositories in electronic for- mat as EHR. Unfortunately, EHR as a repository for healthcare patients’ data, are property of healthcare providers and usually are not accessible to the patients. The patients need to bring their own healthcare data with them when travelling abroad with possibility to gain a better medical assistance in the location where they are.

238

ling abroad with possibility to gain a better medical assistance in the location where they are. They have to have PHR saved and accessible to use them from anywhere and by patient-selected medical teams. The concept of PHR nowadays is associated with web enabled technology intend- ed to e-health and moreover it is a base for evidence-based personal healthcare. PHR enables many advantages such as improving the diagnosis efficiency, providing healthcare data management and real-time information management. For physicians, adopting PHR is very important because it provides access to the patient’s medical records that might be used as a reference for diagnoses [1]. All stored data and medi- cal information can play a positive role in medical practitioners’ decision making for the patient healthcare. The growth of using mobile and ubiquitous computing technologies contributes in increasing the number of records regarding personal health of patients. IoT includes the development of wearable measurement sensors connected with Bluetooth, which are capable to capture and store health-related data as PHR [2]. Techniques for processing data in healthcare include using of machine learning, pat- tern recognition, expert systems, statistics, applied mathematics and artificial intelli- gence [15]. But, there are many obstacles in managing such huge distributed data- bases with medical records because of their complexity, distribution and lack of in- teroperability. Nowadays, the world is facing with global epidemics of cancer, influenza, AIDS, diabetes and obesity which demands increasing need of healthcare professionals. Pa- tients need a more attention and education to cope with their situation. The situation also demands application of world-standards of PHR because of increased mobility of the world population and the need of evidence-based medicine that demand patients’ data [2]. Since 2006, healthcare community works on making standards for PHR. For this reason, in 2012 ISO (International Organization for Standardization) introduced EHR-ISO/TR 14292 and applied the HL7 standard [7]. Although PHR and EHR dif- fer regarding their contents, there is an increasing need of integration of these data into electronic medical records that contain all patients’ health data through their live accurately, as shown on Fig.1. So, the emerging EU projects [21] intend to create a PHR whose owner will be the patient. This will be a starting point for using of PHR abroad and to give patient the possibility to have the access to their PHR stored on cloud environment [21]. When working with patient’s data, data security and privacy issues have to be solved. PHR accepts data obtained from health related equipment such as accelerome- ters, gyroscopes, wireless devices, wristband and smart watches as IoT related devic- es. Data collected from these sensors can be saved in PHR and protected according to the national data privacy law, to help in patient risk detection. ICT Innovations 2019 Web Proceedings ISSN 1857-7288

Fig. 1. Toward integration of EHR and PHR with IoT sensor’s data as exposome.

3 Exposome theory

The concept exposome includes all traditional measurements in healthcare, known as bio-monitoring methods, as well as new types of measurements using sensors. The IoT concept includes a variety of wearable sensors connected to the computers or smart phones [13]. There is no limit for development sensors as measuring instru- ments that provide data such as spatio-temporal data with previous settings of meas- urement units, default values and the value ranges. Exposome is a broader concept in healthcare and medicine. It implies interdisci- plinary and integrated approaches from different domains such as epidemiology, computing, analytics, environment sciences, toxicology and social science. Exposome theory corresponds to the totality of exposure over the whole lifetime, and it is con- nected with usage of sensors and biological biomarkers measurements using high throughput omics technologies in order to create a prediction of effects of these expo- sures on human health conditions, worldwide and for the individual whole lifespan. Some epidemiologists state that exposome data have to be analyzed and connected with particular groups of patients who are exposed to factors that can have an influ- ence on their health conditions that cannot be caused by genetics alone [23]. Precision medicine is focused on adopting treatments based on the genetic profile of tumors including biomarker but also exposome data. Large-scale studies contribute to gene environment interactions and propose epigenetics theory. They fully understand that there are complex interactions between genetics and environmental exposures that can be used to clarify the etiology of many diseases. Nowadays, the epidemiologists remark oncometabolitics as exposome factors. Some research is dedicated to genome-wide association studies or environment wide

240 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

association studies [8]. They connect genetic polymorphism with human diseases or a phenotype by defining in this way the endophenotype of each individual. To clarify the exposome-genome paradigm there are a lot of data that have to be considered and it is just the beginning of this research field [2]. Researchers have to consider matrix selection, variability of exposure and targeted and untargeted analysis to create more reliable data for exposures as exposome data [4]. Also, they have to use bioinformat- ics techniques and omics technologies that have the potential to discover unknown functional relationships among genes and diseases. Many initiatives provide accurate and reproducible laboratory measurements and particular groups of scientists gathered at the National Institute of Environmental Health Science (NIEHS) are dedicated these data to be stored in specific accessible databases such as METLIN (Metabolite and Tandem MS Database) and HMDB (Human Metabolome Database) [21]. The number of these omics data increases over time. The databases and repositories with human exposome data are good starting basis for providing data to the environmental health scientists as well as to the researchers that study the influence of whole expo- sure factors on the human genome. Taking into account that many traditional biomonitoring methods as biological targeted analysis are available in healthcare providers’ databases, created data can be used as a key exposome data for the patients. Also, data from epidemiologic studies are stored in digital repositories and they can be considered in patients’ healthcare dossier, as EHR or PHR [18]. Some studies can provide a wide range of chemical exposure data for selected location in particular temporal points [4]. Some data of patient’s blood and urine analysis for the presence of particular chemicals after short- time exposures can also be considered. But exposome approaches differ from these traditional biomonitoring methods because they include whole significant potential exposures, no matter they are exogenous (as pollutants, noise, UV rays, diets, drugs) or endogenous (as hormones, human or microbe’s metabolites) [16]. Christopher P. Wild stated that incorporation of exposome paradigm in the traditional biomonitoring methods can improve in many ways exposure assessment [24]. The exposome theory also predicts that all measurements currently available and related to the environmen- tal factors that have influence on human health will be considers in PHR. Also data from traditional bio-monitoring methods from medical labs, enriched with measure- ments of individual patient’s parameters with IoT-connected devices will be available for further analysis. All these data can be used to characterize the individual behavior and improve the understanding of exposure profile on individual patient level. Exposome studies in Europe as HELIX, HEALS and EXPOsOMICS already started to capture this type of information as CORDIS (Community Research and Development Information Service) [22]. Also, some universities founded centers for human exposome research and developed infrastructure that uses hybrid and biomoni- toring approach combined for data collection. Also, they combine their techniques with untargeted omics analysis such as metabolomics, proteomics, transcriptomics and epigenomics data that can significantly help to uncover biologically meaningful exposome influence on the human genome. Metabolome and redox proteome data can also provide useful information to improve the composition of nutrients in individu- als' diet on the basis of the features of the environment in which individual live [6].

241 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

4 IoT Technology and Exposome Data

As smartphones technologies become more accessible and widespread, low cost port- able sensors become a “standard” for short or long-term tele-monitoring, software application development and microcontrollers. These applications provide wireless data access, usually in the cloud, where huge amounts of personal data are stored. It is a big technology revolution, because they provide many types of data, such as data for location, time, activities, habits and behaviors named as exposome data. Location can be very important factor for assessment of duration of exposure to a hazard, social environment, climate and other general external exposures, and even the microbi- omics [11]. It is very important to define criteria for using sensors to collect data for further analysis. When using smartphones, data sets can be location, time, circum- stances in which a person is, activities, whether a person is indoor or outdoor, its moveable positions etc. It is easy because smart phones usually have additional sen- sors and embedded algorithms which can help to determine whether a person is in- door/outdoor, whether is running, what are weather conditions, such as temperature, relative humidity, lightness or noise [10]. They usually have I/O (input-output) detec- tors, magnetism sensors and various others sensors with predefined accuracy [15]. Location can be related to the exposition to air pollutants. These data can greatly help to prediction for personal exposome. Physical activity is a very important risk factor for most diseases, and influences the environmental exposure. In that case inhalation and metabolism can be measured as energy spending through breathing rate and inha- lation of the pollutants. There are many smart phones’ sensors for tracking physical activities as tri-axial accelerometers that can be used with gyroscopes. The steps also can be counted as well as metabolic equivalent. The heart rates, RR interval, which is small changes in the intervals between successive heartbeats, breathing rate, accelera- tion, activity level, passed distance etc. can be measured. For instance, absence of physical activities can result with obesity and diabetes.

5 Relationships between PHR and Exposome

The motivation for using software applications relied on sensor data is usually to pro- vide information insight for healthier lifestyle or to provide health alerts. There are accessible fitness trackers and applications for diet control. These applications usually work as questionnaires that have to be filled after meals or they track dietary intake for weight loss or fitness purposes. Usually, they count burnt energy and metabolism rate for food intake [13]. Environmental and climate indoor conditions are usually measured with indoor monitoring sensors that allow access to their home measurements data remotely. In- door and outdoor air quality can be measured, as well as temperature, humidity, mould, ventilation rate, chemical reactions producing secondary pollutants as formal- dehydes and CO2. Air quality can be measured with air sensors such as gas sensors that calculate gas concentration for metal oxide or electrochemical sensors (CO, O3, NO2, ammonia, hydrogen sulfide etc.) as total volatile organic compounds (TVOCs).

242 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

Humidity can be measured as a result of sensitivity of metal oxide sensors and their influence on the temperature rise rate [13]. Laser in higher-end sensors can obtain better results because of using optical methods and photodiode and lasers for indoor monitoring of very small airborne particles. It is very effective for cigarette smokes concentration detection [13]. Other sensors are specific sensors that measure micro and nano-particles. Measurement of these pollutants such as PM10 and PM2.5 are connected with different diseases, as chronic obstructive pulmonary disease (COPD). Another type of smartphone sensor is the noise sensor that is intended to hand- held sound measurement of noise on fixed locations and noise dosimeters as personal exposure of noise. There are many devices that are not standardized and calibrated, but they are not used for working environment as well as for environmental noise measurement. Few smartphones already have the possibility for personal noise meas- urement calibration. The influence of noise can be very destructive for human health, especially for human hearing system as well as for some sleeping disorders. It is espe- cially influential if its exposure is long-time and intensive noisily. Hearing system’s damages can be predicted according to data gained from the measurement of the noise intensity and its duration. Sensors for movement measurement can also measure human trajectory that can be associated with location as well as level of hazard such as air pollution, noise and exposure to UV rays. On the other side, food contamination can be connected with a level of soil contamination with pesticides and heavy metals. There are also sensors for stress detection. Exposure to radiation can be associated with appropriate meas- urements of environmental factors. All these data should be stored in cloud databases as IoT data and connected with the patient’s PHR as individual or as group, if the data is for outdoor sources. When all data points are available, the whole risk factors for the human health can be con- sidered in more details. The whole gathering of exposome data as different exposure variables of one person or a group, will enable creation of more accurate machine learning models that will be able to find connections among seemingly not related exposures and actions of a person and his or her health's risks and diseases, and to suggest the adoption of corrective actions to avoid or mitigate those risks. One case of integration of IoT data, collected from wearable sensors, and PHR is the IPA2 project Cross4all (Cross-border initiative for integrated health and social services promoting safe ageing, early prevention and independent living for all). Sen- sors are connected with mobile devices via Bluetooth as part of project equipment and they send data on the Cross4all cloud. The collected data have to be considered when medical practitioners have to decide about patient health condition in cross border area. Also, many traditional bio-monitoring data gained from traditional medical equipment from public and private healthcare provider databases (from patient EHR) are collected. Data gained from mobile wearable sensors that will be connected to patients for short tele-monitoring period, can be very useful to medical specialists to make decision for patient health in the era of evidence-based medicine. The project aims to create a database with PHR data for project’s participants as well as to collect a large amount of data connected with human health as exposome data that should be integrated with PHR data. Connection with some environmental database that store data for air pollutants, outdoor weather and climate conditions, the presence of nano-

243 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

particles, as environmental exposome data, connected with patient’s location, can also provide very important data to the medical practitioners for particular types of pa- tients with chronicle diseases. Tracking all these data can provide a better risk as- sessment for patient health.

6 Conclusions

Smartphones with IoT sensors have many potential data security and privacy issues. Indeed, one of the main problems is “data ownership”. To resolve this issue, anony- mized data can be a solution. So, there are many challenges when data are used for evidence-based medicine, as data standardization, harmonization, data privacy and confidentiality. Many procedures have to be created to support the wider use of per- sonal sensors, as a part of e-health and IoT concepts. When they are connected with PHR, the problem is more complex because of the quantity of complex data that have to be secured and used for medical purposes only. Another important issue is data collection. Collection, storage, processing and analysis of this kind of data demand big efforts because of the complexity, volume, variety, velocity, values, variability, and veracity of the data. Also, this type of data is very sensitive because of the possibility of abuse, misuse as well as because they con- tain human behavior recognition factors [2]. Another issue is how these sensors’ data have to be used to highlight the influence of environmental factors to the human health and how these exposome data can be used in practice. There are many ongoing exposome projects that should answer how to balance the possibilities of new sensor technologies and more conventional and well-tested methods of bio-monitoring data collection. Exposome tools can also enable the measurement of many environmental parameters and allow improved quantification of individual and population risk fac- tors [20]. Data analysis can also answer these questions about exposome and point out some aspects of their usage. Some semantic approaches also can help in gaining in- formation and clinical context of evaluation and usage of exposome data [7]. However, there is a need of integration of the data collected from various environ- mental sensors and detectors in PHR to provide better disease prognosis, diagnosis and treatment, which we aim to achieve with the Cross4all project.

Acknowledgments

Part of the work presented in this paper has been carried out in the framework of the project "Cross-border initiative for integrated health and social services promoting safe ageing, early prevention and independent living for all (Cross4all)", which is implemented in the context of the INTERREG IPA Cross Border Cooperation Pro- gramme CCI 2014 TC 16 I5CB 009 and co-funded by the European Union and na- tional funds of the participating countries.

244 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

References

1. Andreu-Perez, J., Poon, C.C., Merrifield, R.D., Wong, S.T., Yang, G.Z.: Big data forhealth. IEEE journal of biomedical and health informatics19(4), 1193–1208 (2015) 2. Barouki, R., Audouze, K., Coumoul, X., Demenais, F., Gauguier, D.: Integration of the human exposome with the human genome to advance medicine. Biochimie 152,155–158 (2018) 3. Smith, M.T., McHale, C.M., de la Rosa, R.: Using exposomics to assess cumulative risks from multiple environmental stressors. In: Unraveling the Exposome, pp. 3–22.Springer (2019) 4. Dennis, K.K., Marder, E., Balshaw, D.M., Cui, Y., Lynes, M.A., Patti, G.J., Rappaport, S.M., Shaughnessy, D.T., Vrijheid, M., Barr, D.B.: Biomonitoring in the era of the expo- some. Environmental health perspectives 125(4), 502–510 (2016) 5. Gai, K., Qiu, M., Chen, L.C., Liu, M.: Electronic health record error prevention ap- proach using ontology in big data. In: 2015 IEEE 17th International Conference on High Performance Computing and Communications, 2015 IEEE 7th International Symposium on Cyberspace Safety and Security, and 2015 IEEE 12th International Conference on Em- bedded Software and Systems. pp. 752–757. IEEE (2015) 6. Go, Y.M., Jones, D.P.: Redox biology: interface of the exposome with the proteome, epigenome and genome. Redox biology 2, 358–360 (2014) 7. Fan, J.w., Li, J., Lussier, Y.A.: Semantic modeling for exposomics with exploratory evalu- ation in clinical context. Journal of healthcare engineering 2017(2017) 8. Fernandes, J., Hu, X., Smith, M.R., Go, Y.M., Jones, D.P.: Selenium at the re-dox interface of the genome, metabolome and exposome. Free Radical Biology and Medicine 127, 215–227 (2018) 9. Househ, M., Kushniruk, A.W., Borycki, E.M.: Big Data, Big Challenges: A Health-care Perspective: Background, Issues, Solutions and Research Directions. Springer (2019) 10. Institute of Medicine (US). Committee on the Recommended Social and Behavioral Domains and Measures for Electronic Health Records: Capturing social and behavioral domains and measures in electronic health records: phase 2. National Academies Press (2014) 11. ISO: Health informatics, Personal health records — Definition, scope and context. Swit- zerland, 2012, https:/ /www.iso.org/obp/ui/#iso:std:iso:tr:14292:ed-1:v1:en 12. Lindley, J., Coulton, P., Cooper, R.: Why the internet of things needs object orientated on- tology. The Design Journal 20(sup1), S2846–S2857 (2017) 13. Loh, M., Sarigiannis, D., Gotti, A., Karakitsios, S., Pronk, A., Kuijpers, E., Annesi- Maesano, I., Baiz, N., Madureira, J., Oliveira Fernandes, E., et al.: How sensors might help define the external exposome. International journal of environmental re- search and public health 14(4), 434 (2017) 14. Lopez-Campos, G., Merolli, M., Mart ́ın-S ́anchez, F.: Biomedical informatics and thedigital component of the exposome. Studies in health technology and informatics 245, 496–500 (2017) 15. Morgenthaler, J.: Moving toward an open standard universal health record (2011), http://www. smart-publications.com/articles/moving-toward-an-open-standard-universal- health-record 16. Rappaport, S.M., Smith, M.T.: Environment and disease risks. Science 330(6003), 460– 461 (2010) 17. Ristevski, B., Chen, M.: Big data analytics in medicine and healthcare. Journal of integra- tive bioinformatics 15(3) (2018)

245 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

18. Roehrs, A., Da Costa, C.A., da Rosa Righi, R., De Oliveira, K.S.F.: Personal health rec- ords: a systematic literature review. Journal of medical Internet research 19(1), e13 (2017) 19. Savoska S., Jolevski I.: Architectural Model of e-health System for Support the Integrated Cross-border Services, ISGT 2018 Conference (2018) 20. Smith, M.T., McHale, C.M., de la Rosa, R.: Using exposomics to assess cumulative risks from multiple environmental stressors. In: Unraveling the Exposome, pp. 3–22. Springer (2019) 21. Tautenhahn, R., Cho, K., Uritboonthai, W., Zhu, Z., Patti, G.J., Siuzdak, G.: An accelerat- ed workflow for untargeted metabolomics using the metlin database. Nature biotechnology 30(9), 826 (2012) 22. Vineis, P., Chadeau-Hyam, M., Gmuender, H., Gulliver, J., Herceg, Z., Kleinjans,J., Kogevinas, M., Kyrtopoulos, S., Nieuwenhuijsen, M., Phillips, D.H., et al.: The exposome in practice: design of the exposomics project. International journal of hygiene and environmental health 220(2), 142–151 (2017) 23. Complementing the genome with an exposome: the outstanding challenge of environmen- tal exposure measurement in molecular epidemiology. Cancer Epidemiol Biomarkers Prev. 14, 1847-1850 (2005) 24. Wild, C.P.: The exposome: from concept to utility. International journal of epidemiology 41(1), 24–32 (2012)

246 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

Complex Network Analysis and Big Omics Data

1 Blagoj Ristevski [0000-0002-8356-1203] , 2 Snezana Savoska [0000-0002-0539-1771] and

3 Pece Mitrevski [0000-0002-0300-7115]

1, 2, 3 “St. Kliment Ohridski” University - Bitola, Faculty of Information and Communication Technologies - Bitola, ul. Partizanska bb 7000, Republic of Macedonia 1 [email protected], 2 [email protected] 3 [email protected]

Abstract. In this paper we describe various biological omics data (e.g. ge- nomics, epigenomics, transcriptomics, proteomics, metabolomics and microbi- omics) generated using high-throughput sequencing technologies. These omics data are generated in huge amounts and they follow the 6 V’s properties of big data. To discover hidden knowledge from this big omics data, complex network analysis is used. Biological complex networks such as gene regulatory and pro- tein-protein interaction networks are appreciated resources for discovering dis- ease genes and pathways, to investigate topological properties of the most im- portant genes associated with particular disease. The inference of biological regulatory networks from different high dimensional omics data is a fundamen- tal and very complex task in bioinformatics. Various big omics data have a great potential to uncover diverse perspectives of biological complex networks. Taking into account the properties of the big omics data, we suggest some di- rection for further works.

Keywords: Complex Networks ∙ Big Data ∙ Omics Data ∙ Biological Networks.

1 Introduction

Currently, there are several omics technologies for the analysis of biological samples such as: genomics, epigenomics, transcriptomics, proteomics, metabolomics, which make analysis on the level of genes, epigenome, transcriptome, proteome and metabo- lome, respectively [9]. Advances in these omics technologies have enabled personal- ized medicine at an extraordinarily detailed molecular level [10]. Recent studies have shown that miRNAs are one of the key player of regulation (i.e., many biological processes in metabolism, proliferation, differentiation, devel- opment, apoptosis, cellular signaling, cancer development and metastasis). One of the most essential regulatory role of proteins is transcription regulation. Proteins that bind to DNA sequences and regulate the transcription of DNA, and hence gene expression, are called transcription factors. Transcription factors can in- hibit or activate the expression of target genes. Techniques to measure the gene ex- pression level in biological samples span from gene-specific techniques such as quan-

247 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

titative polymerase chain reaction (qPCR) to high-throughput technologies such as microarray, serial analysis of gene expression (SAGE) or next-generation sequencing (NGS) (i.e., RNA-Seq) [11]. The inference of biological regulatory networks from different high dimensional omics data is an essential and very complex and computationally demanding issue in bioinformatics that demands high performance computing and development of suita- ble algorithms. These huge amounts of experimental omics data have the 6 V’s prop- erties of big data (volume, value, velocity, variety, veracity, variability) and hence they can be named as big omics data. Volume refers to the amount of generated and collected data, while the value re- fers to their coherent analysis, which is valuable to the researchers. Velocity refers to data in motion as well as and to the speed and frequency of data creation, processing and analysis. Complexity and heterogeneity of multiple datasets define the variety of big data. Data quality, relevance, uncertainty, reliability and predictive value describe the veracity of big data, while variability is related to the consistency of data over time [8]. The remainder of this paper is structured as follows. Section 2 describes the con- cept related to big omics data. The subsequent section describes the complex net- works, whereas Section 4 describes the analysis of complex networks in biology, which is based on experimental high-throughput data. The last section provides con- clusions and suggestions for future works.

2 Big Omics Data

Gene-specific techniques and high-throughput experimental technologies used in bioinformatics enable to be generated a huge amount of various biological omics data. These omics data and their mutual interactions can elucidate at systems and molecular level human diseases, such as cancer, and therefore provide the knowledge necessary to control clinical symptoms, to make better diagnosis, prognosis and to develop more effective treatment of diseases. The combination of these different layers of experi- mental omics data can help to identify candidate target genes involved in cancer and other diseases. One of the most suitable manner to obtain better insights into this data, to infer non-observable interactions and to visualize connections, links and groups of entities is network analysis. Using network analysis enables obtaining multi-networks, which are defined as a set of N nodes that interact mutually in M different layers, where each layer reflects a distinct interaction that links the same nodes’ pair [1]. Multi-networks combine different layers of experimental data such as genomic, proteomic and molec- ular interaction data and can be gene regulatory networks, microRNA-target net- works, transcription factor-target and protein-protein interaction networks [1]. The aim of multi-network analysis is to discover unknown information on the structure and dynamics of the complex biological systems, as well as to discover which entities or genes are involved in the oncogenic processes or in other diseases [1]. Most common human disease such as obesity, autism, diabetes and schizophrenia are complex diseases, that is they are result of the combination of multiple genetic and

248 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

environmental factors [10]. Moreover, also the microbiome has been related to many human diseases, mainly of metabolic origin, and for this reason it is attracting grow- ing interest of scientists for the information it can provide about the onset and pro- gression of these diseases. The inference of biological regulatory networks from different high-dimensional omics data is a very complex issue in bioinformatics because of the requirement for high-performance computing and suitable algorithms that can be able to work in par- allel.

3 Complex Networks

Networks are essential tools for modelling and analysis of complex interactions be- tween entities in biology as well as in medicine, physics, neuroscience and technical and social networks. Once extracted from suitable data, networks enable understand- ing the fundamental structures that control many complex systems by detection of high-order connectivity patterns [2]. Typical represents of these densely connected patterns or network subgraphs, are called networks motifs that can be categorized as feed-forward loops, open bidirectional edges, triangular motifs or two-hop paths. For instance, biological complex systems transit from one to another state, under- lying a variety of phenomena from cell differentiation to recovery from disease [4]. To model behavior of biological systems, the most suitable manner is by using net- works, whose nodes stand for dynamic entities such as genes, proteins, microRNAs, metabolites, miRNA-protein complexes, while the links and interactions among them are represented by network edges [4]. Biological complex networks such as gene regulatory and protein-protein interac- tion networks are appreciated resources for discovering disease genes and pathways, to investigate topological properties of the most important genes associated with par- ticular disease. Hybrid Functional Petri Nets (HFPNs) have been deliberately introduced to model biological networks [15]. Besides the coexistence of both continuous places associat- ed with real variables (e.g. concentration levels) and discrete places (marked with tokens) allowed in Hybrid Petri Nets (HPNs), several additional features have been included: continuous transition firing rates can depend on the values of the input plac- es and the weights of arcs can be defined as a function of the markings of the con- nected places [14]. These, in turn, increase the modelling power of HPNs and their application in biological network analysis. Gene regulatory networks can indeed provide significant insights into the mecha- nisms of complex diseases such as cancer. However, there are still challenging tasks to reconstruct these networks because of the complex regulatory mechanisms by which genes are influenced such as transcription factors concentration or availability, non-coding RNAs with different regulatory functions but also by the presence of gene or genomic modifications such as mutations, single nucleotide polymorphisms (SNP), copy number variations (CNV) or epigenetic modifications [6] [7]. Taking into ac- count that gene expression is a product of complex interactions of many biological

249 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

entities and processes, multi-omics data are required for more reliable networks re- construction and for better performances. Since HPNs provide an appropriate way to represent protein concentration dynam- ics being coupled with discrete switches, HPN modelling of gene regulatory networks has been considered by Matsuno et al. [16]. DNA modification, transcription, transla- tion, post-transcriptional and translational modifications, are a number of stages whose representation would be required by a detailed modelling of gene regulatory networks. Regulatory networks are close controlled at transcriptional and post- transcriptional level and their changes invoke changes in expression levels of genes that participate in protein-protein interactions or that are targets of miRNAs or of other regulatory non-coding RNAs. Indeed, the mutual interaction between miRNAs and their target genes or transcription factors makes miRNAs the most important players in gene regulation. For this reason miRNAs have a primary role in many hu- man diseases, such as cancer, neurological disorders or syndromes, rheumatic, cardi- ovascular and metabolic diseases [12].

4 Complex Networks Analysis

Two main stages of network analysis are network reconstruction and network interro- gation. Reconstruction or reverse engineering of biological networks is a data-driven inference of nodes (e.g. genes, proteins, miRNAs, metabolites, miRNA-protein com- plexes) and edges among nodes. The aim of systematic network analysis, called net- work interrogation, is to obtain optimal information insights from reconstructed net- works. All data-driven interactions could be done using supervised, unsupervised and semi-supervised machine learning techniques [5]. Using these techniques could help to discover hidden patterns from big omics data and to predict complex phenotypes. Reverse engineering of gene regulatory networks, which employs high dimensional gene expression data, demands developing of robust and computational efficient algo- rithms for network inference. These algorithms should resolve scalability and usabil- ity of gene regulatory networks inferred from experimental omics data [3]. Biological networks are evaluated through the analysis of their properties, such as community detection and link prediction. Community detection in a complex network identifies groups of nodes that are densely connected to each other, but sparsely connected to the nodes that belong to the other groups in the network. To analyze the real network structure and the features of the complex networks as well as the community detection algorithms increasingly become a research challenging topic in the field of complex networks and big data. The identification of resulting communities of densely connected nodes is essential as they may help to reveal a priori unknown functional modules. Most commonly used community detection algorithms are Louvain, Modularity optimization and Infomap. The aim of the Louvain algorithm is to maximize the outcomes of modularity of en- tire community partition. In biological networks, the existence of a link between two nodes must be proved by usually very costly laboratory experiments [13]. On the other hand, omics data used in networks reconstruction may contain inaccurate information, which leads to

250 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

erroneous link detection. Link prediction algorithms are used to identify these spuri- ous links and to predict actual network links.

5 Conclusion

The development of new machine learning techniques and approaches make com- monly used machine learning algorithms easy to be adopted by bioinformaticians and to become essential tools for the analysis of big omics data. As multiple omics technologies can provide a clearer picture of cell functions but also of alterations due to diseases, integration of these technologies with traditional clinical tests will become a routine in future clinical health and disease investigations. Big omics data of different nature have great potential to analyze complex biolog- ical networks under different perspectives. Therefore, the integration of heterogene- ous data for the inference of regulatory networks from available biological knowledge actually represents a theme of great interest for computer scientists.

Acknowledgement

This paper was supported by the Ministry of Education and Science of the Republic of Macedonia and the Ministry of Science and Technology (MOST) of the Govern- ment of the People’s Republic of China within the bilateral project “Modeling, Simu- lation and Analysis of Complex Networks in Biology”.

References

1. Cantini L, Medico E, Fortunato S, Caselle M. Detection of gene communities in multi- networks reveals cancer drivers. Scientific reports; 5:17386. (2015) 2. Benson AR, Gleich DF, Leskovec J. Higher-order organization of complex networks. Sci- ence;353(6295):163-6. (2016) 3. Fronczuk M, Raftery AE, Yeung KY. CyNetworkBMA: a Cytoscape app for inferring gene regulatory networks. Source code for biology and medicine; 10(1):11. (2015) 4. Dong X, Yambartsev A, Ramsey SA, Thomas LD, Shulzhenko N, Morgun A. Reverse enGENEering of regulatory networks from big data: a roadmap for biologists. Bioinfor- matics and biology insights;9:BBI-S12467. (2015) 5. Noor E, Cherkaoui S, Sauer U. Biological insights through omics data integration. Current Opinion in Systems Biology. (2019) 6. Zarayeneh N, Ko E, Oh JH, Suh S, Liu C, Gao J, Kim D, Kang M. Integration of multi- omics data for integrative gene regulatory network inference. International journal of data mining and bioinformatics;18(3):223. (2017) 7. Yuan L, Guo LH, Yuan CA, Zhang Y, Han K, Nandi AK, Honig B, Huang DS. Integration of Multi-omics Data for Gene Regulatory Network Inference and Application to Breast Cancer. IEEE/ACM transactions on computational biology and bioinformatics;16(3):782- 91. (2018) 8. Ristevski B, Chen M. Big data analytics in medicine and healthcare. Journal of integrative bioinformatics;15(3). (2018)

251 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

9. Gallagher IJ, Jacobi C, Tardif N, Rooyackers O, Fearon K. Omics/systems biology and cancer cachexia. In: Seminars in cell & developmental biology; vol. 54, pp. 92-103. Aca- demic Press. (2016) 10. Karczewski KJ, Snyder MP. Integrative omics for health and disease. Nature Reviews Genetics;19(5):299. (2018) 11. Ristevski B. Overview of computational approaches for inference of microRNA-mediated and gene regulatory networks. In Advances in Computers; vol. 97, pp. 111-145. Elsevier. (2015) 12. Ristevski B. A survey of models for inference of gene regulatory networks. Nonlinear Anal Model Control; 18(4):444-65. (2013) 13. Lü L, Zhou T. Link prediction in complex networks: A survey. Physica A: statistical me- chanics and its applications; 390(6):1150-70. (2011) 14. Alla H, David R. Continuous and hybrid Petri nets. Journal of Circuits, Systems, and Computers; 8(01):159-88. (1998) 15. Matsuno H, Tanaka Y, Aoshima H, Matsui M, Miyano S. Biopathways representation and simulation on hybrid functional Petri net. In silico biology. 3(3):389-404. (2003) 16. Matsuno H, Doi A, Nagasaki M, Miyano S. Hybrid Petri net representation of gene regula- tory network. In: Biocomputing; pp. 341-352. (2000)

252 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

Big Data analysis and reproducibility: new challenges and needs of the post-genomics era Aspasia Efthimiadou 1[0000-1111-2222-3333], Eleni Papakonstantinou 2[1111-2222-3333-4444], Di- mitrios Vlachakis 2[1111-2222-3333-4444], 3[2222-3333-4444-5555], 4[3333-4444-5555-6666], Domenica D’Elia 5[4444-5555-6666-7777], Erik Bongcam-Rudloff 6[5555-6666-7777-8888]

1 Hellenic Agricultural Organization – Demeter, Institute of Soil and Water Resources, Depart- ment of Soil Science of Athens, Lycovrisi, Greece 2 Genetics and Computational Biology Group, Laboratory of Genetics, Department of Biotechnology, Agricultural University of Athens, 75 Iera Odos, 11855, Athens, Greece 3 Division of Endocrinology and Metabolism, Center of Clinical, Experimental Surgery and Translational Research, Biomedical Research Foundation of the Academy of Athens, Athens, Greece 4 Department of Informatics, Faculty of Natural and Mathematical Sciences, King's College London, Strand, London WC2R 2LS, UK 5 CNR Institute for Biomedical Technologies, Bari, Italy 6 SLU-Global Bioinformatics Centre, Department of Animal Breeding and Genetics, University of Agricultural Sciences, Uppsala, Sweden

*Correspondence should be addressed to Aspasia Efthimiadou ([email protected]) and Eleni Papakonstantinou ([email protected])

Abstract. Reproducibility constitutes an important criterion for the evaluation of the quality of research and robustness of results. However, while big data are accumulating at a fast pace, data management, integration, sharing and interpretation become rather challenging. The massive amount of different kinds of data produced need standardised approaches for their production and processing. To enable researchers to assure and ensure the quality of data and research results, uniform and widely adopted procedures, pipelines, data repositories, bioinformatics tools and data-sharing platforms are mandatory. Many efforts in this direction have been made in the last decade by scientists and by the governing bodies, funding agency and publishers. The application of FAIR (Findable, Accessible, Interoperable, Reusable) and Open Science principles to life sciences research are the most important results produced by the combination of all these efforts. Nevertheless, a lot of work is still necessary for these principles and their application to be integrated and fully implemented by the whole scientific community. Obstacles are represented by many different factors, the most important being the still unequal pre- and post-degree education of researchers from different countries of the EU, difficulties of scientists to adopt standards and standard operating procedures (SOP) because of their lack or because of missing guidelines, to mention only the most relevant ones. In this scenario, the EMBnet workshop will discuss main factors affecting reproducibility of life sciences research and explore the need for standards with a particular focus on the reproducibility of computer-aided drug design and metagenomic assemblies.

253 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

Keywords: Life Sciences; Reproducibility; Standards; Computer-aided drug design; Metagenomic Assemblies.

1 Introduction

Reproducibility constitutes an essential criterion for the evaluation of the quality of research and robustness of results. A fundamental pillar for life sciences is that published, acknowledged research should be reproducible leading to comparability between experiments and laboratories and validating and advancing research. While big data are accumulating at a fast pace, data management, integration, sharing and interpretation become rather challenging. The massive amount of different kinds of data produced need standardised approaches for their production, computational processing, integration and analysis. To enable researchers to ensure and assure the quality of data and research results for their reuse, uniform and widely adopted procedures, pipelines, data repositories, bioinformatics tools and data-sharing platforms are mandatory. In this paper, we make a summary of most relevant initiatives, obstacles and possible solutions related to data reproducibility for life sciences that will be discussed at the EMBnet workshop - Big Data analysis and reproducibility: new challenges and needs of the post-genomics era - organised as a special session of the 11th ICT Innovations Conference 2019.

2 Related Initiatives

Several initiatives have been introduced to answer to the need for standardisation of life sciences data for them to be FAIR - Findable, Accessible, Interoperable and Reusable - to ensure their effective and efficient digitalisation [1]. Standardisation and quality management are indeed essential requirements for research data to be easily accessible, shareable and comparable, and therefore, to enable data reuse, reduce research costs, improve efficiency and technological transfer of research. To answer to these pressing needs of the scientific research community, in 2011 the European Commission (EU) proposed a reform package including a new regulation on European standardisation aiming to increase the system’s inclusiveness, speed, responsiveness, transparency, flexibility and scope, and to promote the identification of strategic priorities. One of the identified priority domain was, for example, biotechnology and biotechnology processes (e.g., ontologies, biobanks/bioresources, analytical methods). Therefore, the International Organisation for Standardisation (ISO) established a technical committee in 2013 to identify standardisation needs and gaps in these fields (ISO/TC 276 initiative). Similar efforts have been dedicated to other life sciences research domains. CDISC1 (Clinical Data Interchange Standards Consortium) aims to standardise and interchange clinical trial data in a global, platform-independent way [2]. The MIBBI2

1 https://www.cdisc.org/ 2 http://www.dcc.ac.uk/resources/metadata-standards/mibbi-minimum-information- biological-and-biomedical-investigations

254 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

(Minimum Information about a Biomedical or Biological Investigation) portal, groups nearly 40 checklists of Minimum Information for various biological disciplines that the MIBBI Foundry is now analysing to create an inter-compatible, extensible community of standards. COMBINE3, the COmputational Modeling in BIology Network, is an initiative to coordinate the development of the various community standards and formats for computational models to develop a set of interoperable and non-overlapping standards, covering all aspects of modelling in biology. Specific ontologies have also been developed for the classification and standardisation of terminologies that are extremely useful to facilitate comparison, integration and interpretation of data. A catalogue of biomedical ontologies is available at BioPortal4. Other important actions are represented by specific projects, such as AllBio5, a Coordination Action funded under the FP7 (FP7-KBBE), which gathered information about existing bioinformatics tools and Web services across life sciences (from humans, animals and plants to microorganisms), and identified areas where more development or greater integration and interoperability was required. In addition to identifying development areas, AllBio established bridges between diverse life-science communities in the field of bioinformatics. The result was the constitution of a new consortium of many different research Institutes across Europe that are currently working altogether at the specific aims of the COST Action CHARME6 - Harmonising standardisation strategies to increase efficiency and competitiveness of European life-science research (CA15110). The main objective of CHARME being to unite experts from all relevant standardisation-related fields to improve coordination of standard development efforts, reducing redundancy, shortening the design cycle, and providing better quality products; influence ontology and data-transfer format issues in a concerted way; monitor technological developments and their impact on future needs, and disseminate information, more efficiently.

3 Factors influencing reproducibility in life sciences research

In life sciences, produced data may differ in terms of different instruments used or of a different way of recording, making the need for standards mandatory. Poor research practices and experimental design can negatively impact on the research results. Protocols for good laboratory practices and precise, well-defined techniques are required to achieve reproducibility. Factors that affect reproducibility can also vary from the lack of access to raw data, research materials and adopted methodologies to cognitive bias where the researcher’s background and judgement can affect the outcome of the results. Inequalities in the pre- and post-degree education of researchers from different countries of the EU, and in the adopted standards and standard operating procedures (SOP), or even lack of guidelines, can lead to irreproducible research. Ongoing training of researchers on current guidelines and standards for structuring an

3 http://co.mbine.org/ 4 http://bioportal.bioontology.org/ 5 https://cordis.europa.eu/project/rcn/100520/factsheet/en 6 https://www.cost.eu/actions/CA15110/#tabs|Name:overview

255 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

experiment and performing analysis could improve experimental reproducibility. In- deed, despite the challenges, systematic and comprehensive multidimensional data gen- erated in life sciences research can be approached with reproducible pipelines and op- timal guidelines and specifications. A representative example of this is the Data Carpentry initiative, that is a non-profit scientific community that develops and teaches workshops on data management and analysis for researchers [3]. The Carpentries aim to set up standardised pipelines to facilitate and empower researchers of any given background to meaningful output re- sults. They have developed a genomics curriculum for data analysis, including best practices as a consensus of the experience, contribution and evaluation of experienced and non-experts in the field. Through this pipeline, a researcher can perform assemblies on genomic and metagenomic data, ensuring the reliability of the results.

4 Reproducibility of computer-aided drug design and metagenomic assemblies

The increasing availability of open-source software and open data for cheminformatics and structural bioinformatics has shaped research in the field of computer-aided drug design (CADD) [4]. An astounding number of bioinformatics software tools have been developed implementing different approaches for making use of produced bio-datasets. It is well expected that in silico research is easier to be reproducible. Still, even if the software is usable and accessible (open-source), the reproducibility of the results is not guaranteed. The different combinations of computational tools, parameters and workflow one can apply enables for powerful, yet not reproducible solutions. In multiple steps of a computer-aided drug design pipeline, modules require for a randomizer to initialize, for example, a position in the search space. In a homology modelling pipeline, the first step of the algorithm is to set up and optimise the sequence alignment between the target protein and chosen templates. The ever-growing number of sequence alignment algo- rithms and their different user-defined parameters can output different results that lead to a non-standard selection of structural template. The same thing applies for algorithms that perform partial geometry alignment, energy minimisations and molecular dynam- ics. Random acceleration molecular dynamics make use of a randomiser to assign ran- domly oriented acceleration to the centre of mass of one group of atoms and initiate the simulation. Machine learning techniques that can be applied in each step of a CADD pipeline (target identification and validation, compound screening and inhibitor design, as well as in preclinical and clinical drug development), are also based on randomness. An additional factor that affects reproducibility is the human factor, i.e. the user. The best results are more frequently achieved when computational methods are supervised by expert users throughout the pipeline, having to choose for an optimal template for ho- mology modelling, the correct force field and thermodynamic parameters for energy minimization and molecular dynamics simulations [5].

256 ICT Innovations 2019 Web Proceedings ISSN 1857-7288

As with computer-aided drug design, reproducibility issues on genomic and meta- genomic assemblies are producing the same challenges. In the post-genomics era, the ability to generate massive NGS data has outrun the ability to store, process and analyse these data. Additionally, the bioinformatics tools developed for genomic data analysis are numerous, and the lack of a widely adopted, established, open-source platform for genomic and metagenomic assemblies is reinforcing irreproducibility of the results. Considerable time, effort and money can be spent on misleading or lack of information on the methodologies, workflows and software used, when researchers try to reproduce published results. To this end, various Workflow Management Systems (WMS) have been designed to establish an automated data-driven workflow [6]. However, these at- tempts have not yet been adopted by the scientific community. Initiatives such as the Data Carpentries, that aim on both training of researchers and the standardisation of workflows with open-source software, can be proven to be fruitful. Overall, reproducibility in research is fundamental for the reliability of scientific methods and experimental design. However, Ongoing percentage of the published work is reproducible (only ~11% in 2012) [7]. Factors such as accessibility of the data, non- standard management of raw data, data analysis and metanalysis, software accessibility and usability, and even the core of the computational approaches that require random- ness for the efficiency of the algorithms can lead to non-reproducible results. Standard guidelines and well-defined practices are mandatory to ensure standardisation of the scientific research and results and reinforce scientists to advance research.

References

1. Wilkinson, M.D., Dumontier, M., Aalbersberg, I.J.,et al.: The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data, 3:160018 (2016) 2. Hume, S., Chow, A., Evans, J., et al.: CDISC SHARE, a Global, Cloud-based Resource of Machine-Readable CDISC Standards for Clinical and Translational Research. AMIA Joint Summits on Translational Science, 2017:94-103 (2018) 3. Pawlik, A., van Gelder, C., Nenadic, A., et al., Developing a strategy for computational lab skills training through Software and Data Carpentry: Experiences from the ELIXIR Pilot action. F1000Research 6: p. ELIXIR-1040 (2017) 4. Sydow ,D., Morger, A., Driller, M., Volkamer, A.: TeachOpenCADD: a teaching platform for computer-aided drug design using open source packages and data. Journal of Cheminfor- matics, 11(1):29 (2019) 5. Barril, X., Computer-aided drug design: time to play with novel chemical matter. Expert Opinion on Drug Discovery, 12(10):977-980 (2017) 6. Leipzig, J.: A review of bioinformatic pipeline frameworks. Brief Bioinform. 18(3):530-536 (2017) 7. Begley, C.G. and Ellis, L.M.: Drug development: Raise standards for preclinical cancer re- search. Nature, 483(7391):531-3 (2012)

257 Author Index

Aguiar, Lvia, 169 Ivasic-Kos, Marina, 110

Barakovi Husi, Jasmina, 195 Jakimovski, Boro, 67 Barakovi, Sabina, 195 Jeliskoski, Bogdan, 16 Belabed, T., 231 Jolevski, Ilija, 237 Belani, Hrvoje, 213, 216 Jovanovska M., Elena, 58 Beteta-Medina, Miguel ngel, 153 Blazevski, Dobre, 16 Koceska, Natasa, 137 Blazheska Tabakovska, Natasha, 237 Koceski, Saso, 137 Bleda-Toms, Andrs L., 153 Koch, Volker, 140 Bongcam-Rudlo↵, Erik, 253 Koteska, Bojana, 1 Buric, Matija, 110 Kotradyova, Veronika, 145 Burnard, Michael D., 225 Kovachev, Aleksandar, 31 Krejcar, Ondrej, 195 Chorbev, Ivan, 195, 198, 210 Costa, Solange, 187 La↵on, Blanca, 187 Cruz Lendnez, Alfonso, 175 Lage, Bruna, 187 Lajmi, Amal, 143 Dantas, Carina, 207 Lameski, Petre, 195, 198, 210 Didiariekyt, Ugn, 181 Landowska, Agnieszka, 222 Dimitrievska Ristovska, Vesna, 76 Langhammer, Birgitta, 164 Dimitrova, Vesna, 31 Le, Bac, 98 Dobravalskis, Mantas, 181 Lipovac, Dean, 225 Dobre, Ciprian, 157, 161 Ljubi, Igor, 195, 213, 216 Drusany Stari, Kristina, 213, 216 Lpez-Garca, Pedro C., 153 DElia, Domenica, 253 Madevska Bogdanova, Ana, 1 Efthimiadou, Aspasia, 253 Madureira, Joana, 169 Maestre-Ferriz, Rafael, 153 Garcia, Nuno M., 198, 210 Maresova, Petra, 195 Gitova, Dina, 58 Marsalkova, Katerina, 184 Gjorgjevikj, Ana, 42 Mavromoustakis, Constandinos, 157, 161 Goleva, Rossitza, 198, 210 Melero-Muoz, Francisco J., 153 Gonzlez Lpez, Lucia, 175 Mendes, Ana, 169 Gramatikov, Sasho, 42 Mihova, Marija, 87 Grozdanov, Vassil, 76 Mitrevski, Pece, 247 Gusev, Marjan, 1 Gmez Gmez, Maria Victoria, 140 Naboni, Emanuele, 172 Naumoski, Andreja, 58 Haj Taieb, Amine, 143 Neves, Maria Paula, 169 Held, Paul, 219 Ortet, Sofia, 207 Ilijoski, Bojan, 125 Isaacson, Michal, 178, 228 Pachovski, Veno, 16 Isteni Stari, Andreja, 204 Papakonstantinou, Eleni, 253 Ivanoska, Ilinka, 87 Papanikolaou, Katerina, 157, 161 Paulin, Goran, 110 Suttner, Tamara, 219 Perandrs Gmez, Ana, 175 Pereira, Cristiana, 169 Taveter, Kuldar, 228 Petkovski, Goran, 67 Teixeira, Joo Paulo, 169, 187 Petrova, Tsvetelina, 76 Teixeira-Gomes, Armanda, 169, 187 Phan, Huan, 98 Todoroi, Dumitru, 189, 192 Pires, Ivan, 198, 210 Tojtovska, Biljana, 125 Pombo, Nuno, 198, 210 Tomsone, Signe, 195 Torres, Pedro, 169 Ramos, V., 231 Trajanov, Dimitar, 42 Raycheva, Lilia, 201 Trajkovik, Vladimir, 195, 198, 210 Ribarski, Panche, 125 Trifunovski, Kristijan, 87 Ristevski, Blagoj, 237, 247 Trivodaliev, Kire, 87 Rodil, Kasper, 178 Valderrama, C., 231 Valdiglesias, Vanessa, 187 Sajini, Neka, 204 van Staalduinen, Willeke, 207 Sandak, Anna, 204 Velinov, Goran, 67 Savoska, Snezana, 237, 247 Venov, Dimitar, 67 Schmeidler, Karel, 184 Vidal-Poveda, Jose Arturo, 153 Seduikyte, Lina, 181 Vlachakis, Dimitrios, 253 Shay, Vered, 228 von Both, Petra, 140 Silva, Susana, 187 Simjanoska, Monika, 1 Woll, Ainta, 164 Skjnberg, Jeppe, 164 Spasov, Aleksandar, 42 Yvano↵-Frenchin, C., 231 Stamenov, Dejan, 67 Stojmenovska, Irena, 16 Zdravevski, Eftim, 195, 198, 210