2021 International Conference on Digital Contents: AICo(AI, IoT and Contents) Technology (DigiCon-21)

Message from the General Chairs

DigiCon-21

On behalf of the organizing committees, it is our pleasure to welcome you to the 2021 International Conference on Digital Contents (DigiCon-21) will be held in Jeju, Korea, July 7-8, 2021.

DigiCon-21 is the 6th event of the DigiCon conference series which is organized by the DCS (Korea Digital Contents Society) and hosted by the DCS.

We would like to sincerely thank the invited speaker who kindly accepted our invitations, and in this way, helped to meet the objectives of the conference:

● Mashhur Sattorov, SW Development Engineer, Amazon, Canada, : Best practice for building low-latency and high-volume APIs with Serverless Cloud Computing

DigiCon-21 will be held at Ocean Suites Jeju Hotel, Jeju, Korea. There are numerous local attractions that attendees can enjoy, including:

● Jeju National Museum, opened in June 2001, displays 950 pieces out of 7230 collections year round. In the Exhibition room 1 of prehistoric age and archeology are 250 pieces of paleolithic and neolithic artcraft excavated at Gosan-ri.

● Jungmun Saekdal Beach is surrounded by steep cliffs that create a cozy and romantic atmosphere. It is situated within the Jungmun Tourism Complex where visitors can enjoy summer vacation by utilizing a variety of entertainment and recreation facilities.

● Seopji-Koji, jutting out at the eastern seashore of Jeju Island, is one of the most scenic views with the bright yellow canola and Seongsan Sunrise Peak as a backdrop.

● The Jeju Shinyoung Cinema Museum will provide thick nostalgia for everyone, beautiful memories for people in love, and endless dreams for children.

● Hallim Park in this multi-faceted park, the palm trees tower along the road. The “Subtropical Garden” greenhouse showcases some rare subtropical plants and the wild plants of Jeju Island.

● Manjang Cave, situated at Donggimnyeong-ri, Gujwa-eup, North Jeju, 30 kilometers east of Jeju City, was designated as Natural Monument No. 98 on March 28, 1970. The 7,416-meter long cave has been officially recognized as the longest lava tube in the world.

● Yongduam, a rock showing a writhing dragon with a grudge A dragon living in the Dragon Kingdom wanted to ascend to heaven, but it wasn’t an easy thing to do.

i 2021 International Conference on Digital Contents: AICo(AI, IoT and Contents) Technology (DigiCon-21)

● Hangpaduri Fortress is a National Historical Remains No. 396. This place has a special historical meaning as the special capital defense unit that strongly resisted the invasion of Mongolia until the last possible moment fought from here.

In addition, there are many places to enjoy the traditional Jeju foods such as Galchiguk (Hairtail fish soup), Kkwong- toryeom (Sliced pheasant meat boiled lightly with vegetables), Kkwong-memil guksu (Buckwheat noodles with pheasant meat), Jarihoe (raw damselfish), Jarimulhoe (raw damselfish in chilled -broth), Okdom-gui (Broiled sea bream), Seonggeguk (Sea urchin soup), Jeonbokjuk (Rice porridge with abalone), Obunjagi Kettle Rice, Black Pig Bulgogi Dish, Bingddeok (Bing rice cake) and etc.

Finally, we would like to express our deepest appreciation to the DigiCon-21 Committee and Chairs for their constant supports, invaluable guidance and advice all the time. Thanks to all the volunteers, reviewers, and many people that put extraordinary efforts in this event; we hope you find DigiCon-21 enjoyable and please let us know any suggestions for improvement.

- Yoon-Ho Kim (Mokwon University, Korea)

- Ka Lok Man (Xi'an Jiaotong-Liverpool University, China)

DigiCon-21 General Chairs

ii 2021 International Conference on Digital Contents: AICo(AI, IoT and Contents) Technology (DigiCon-21)

Message from the Program Chairs

DigiCon-21

Welcome to the 2021 International Conference on Digital Contents (DigiCon-21)!

The main focus of DigiCon-21 is not only various Smart Media, Arts & Culture Technology based on Digital Contents research issues, but also its coverage can be extended to a wide spectrum of topics related to Digital Contents with AICo (AI, IoT and Contents) Technology. And DigiCon-21 would be a professional and informative forum for scientists, engineers, experts, professors, and students in the areas of Digital Contents, Information Technology, Computing Technology, Communication & Network, HCI, Business & Management, Intelligent System, IT-based Convergence- Contents Research, and also in multidisciplinary or interdisciplinary research fields.

During the last decade, many countries have adopted in recent years roadmap for developing the future infrastructure in emerging digital contents technology areas such as Artificial Intelligence, IoT, digital contents, arts & culture technology convergence, smart media, mobile technology, and ICT applications technology. Our special interest lies in both theoretical and practical research contributions presenting new techniques, concepts, or analysis reports on experiences and experiments of implementation. Also, we are interested in application of theories, and tutorials on new technologies and trends in various digital contents/services areas and in all relevant science and engineering areas.

In DigiCon-21, we received many submissions that have conducted blind review by at least two reviewers of the technical program committee that consists of leading researchers around the globe. We finally selected 20 papers and special presentations in total. All selected papers will be included in the conference proceedings published by DCS Korea Digital Contents Society). Without the reviewers’' hard work, achieving such a high quality of proceedings would have been impossible. We take this opportunity to thank them for their great support and cooperation.

Finally, we would like to thank you all for your participation in our conference and thank all the authors and organizing committee members.

Thank you for your great collaboration and enjoy the conference!

Young-Ae Jung, Sun Moon University, Korea

Yangsun Lee, Hanshin University, Korea

Chun-Feng Liao, National Chengchi University, Taiwan

DigiCon-21 Program Chairs

iii 2021 International Conference on Digital Contents: AICo(AI, IoT and Contents) Technology (DigiCon-21)

Conference Organization

DigiCon-21

Honorary Chair Hyun-Bae Yoo (Korea Nazarene University, Korea)

General Chairs

Yoon-Ho Kim (Mokwon University, Korea) Ka Lok Man (Xi'an Jiaotong-Liverpool University, China)

General Vice-Chairs

Soo-Cheol Ha (Daejeon University, Korea) Kuo Yuan Hwa (National Taipei University of Technology, Taiwan)

Program Chairs Young-Ae Jung (Sun Moon University, Korea) (Leading Chair) Yang-Sun Lee (Hanshin University, Korea) Chun-Feng Liao (National Chengchi University, Taiwan)

Publication Chairs Mucheol Kim (Chung-Ang University, Korea) Ki-Hong Park (Mokwon University, Korea) Sangoh Park (Chung-Ang University, Korea)

Publicity Chairs Jinsul Kim (Chonnam National University, Korea) Sanghyun Seo (Chung-Ang University, Korea) Yanghoon Kim (Far East University, Korea) Seung-Taek Ryu (Hanshin University, Korea) Taeshik Shon (Ajou Uiversity, Korea) Sangsoo Yeo (Mokwon University, Korea)

iv 2021 International Conference on Digital Contents: AICo(AI, IoT and Contents) Technology (DigiCon-21)

Registration Chairs Jae-Myeong Choi (Mokwon University, Korea) Hwa-Ryung Lee (Digital Contents Society, Korea)

International Liaison Chairs Hyun-Guk Ryu (Tsukuba University of Technology, Japan) Jongsung Kim (Kookmin University, Korea) Guenjun Yoo (GitHub, Australia) Jae-Geon Jang (Hanshin University, Korea) Danata D. Acula (University of Santo Tomas, Philippine)

Local Arrangement Chairs Soo Kyun Kim (Jeju National University, Korea) Jeong-Dong Kim (Sun Moon University, Korea) Jaehun Jang (Busan KyungSang College, Korea)

Steering Committee Ki-Cheon Bang (Namseoul University, Korea) Hyungjung Kim (Korea University, Korea) Heau-Jo Kang (Mokwon University, Korea) Hwajin Park (Sookmyung Women's University, Korea) Hyun-Bae Yoo (Korea Nazarene University, Korea)

Publicity Committee Young-Chul Kim (ICT Polytech Institute of Korea, Korea) Jae-Hak Yu (ETRI, Korea) Kwangil Ko (Woosong University, Korea) Jong-Chan Kim (Sunchon National University, Korea)

v 2021 International Conference on Digital Contents: AICo(AI, IoT and Contents) Technology (DigiCon-21)

Invited Talk

Mashhur Sattorov

SW Development Engineer Amazon, Canada

About SW Development Engineer, Mashhur Sattorov

Engineer, Mashhur Sattorov was born in Uzbekistan. He received bachelor degree at Tashkent University of Information Technology specializing in e-Commerce and obtained Master Degree at Mokwon University in South Korea. He has been continuing his career in several industries such as Safety Systems at Airports and e-Commerce. Nowadays, he has been mainly focusing on developing customer-centric e-Commerce applications at Amazon.

Title : Best practice for building low-latency and high-volume APIs with Serverless Cloud Computing

Service-oriented startups/companies consider cloud services such as Amazon Web Services for their capability to quickly deliver products to competitive markets at the right time. Growing demand for IaaS and PaaS of cloud services because of its flexibility and lower cost compared to owning managed infrastructure/platforms has influenced the digital market in recent decades. However, how efficiently utilizing cloud resources is the biggest topic and challenge among engineers. For example, exploiting an event-driven serverless computing platform, AWS Lambda instead of its EC2s for infrequent on-demand processes will give an outcome of economic benefits. But nowadays, serverless computing is also one of the efficient ways for microservice frequent invocation processes, such as for handling synchronous frontend requests through the API Gateway. The performance for this case requires optimizing overall infrastructure to keep a balance with cost. In this talk, we focus on best practices using serverless computing for creating low-latency, high-volume APIs with provisioned concurrency.

vi 2021 International Conference on Digital Contents: AICo(AI, IoT and Contents) Technology (DigiCon-21)

PROGRAM SCHEDULE

July 7, 2021 (Wednesday)

Local Arrangement Meeting 09:00- Chair: Yang-Sun Lee 10:00 Cherry Hall

DCS & DigiCon-21 Steering Meeting 10:15- Chair: Yoon-Ho Kim 11:15 Cherry Hall

DigiCon-21 Conference Committee Meeting 11:30- Chair: Young-Ae Jung 12:30 Cherry Hall

12:30- Lunch 14:30 The place will be probably changed in accordance with the Corona Prevention Guidelines. If changed, we will inform you again.

Registration 14:30 Ocean Suites Jeju Hotel, 2nd Floor

Session C-1 : Hyodol-bot R&D Seminar 14:30- Hanshin University (Closed Session) 18:00 Chair: Yang-Sun Lee Canola Hall

18:10- DigiCon-21 Committee & DCS Meeting 19:00 Chair: Young-Ae Jung Canola Hall

1. A paper presentation should be made by one of authors of the paper, during a 20 minute time slot (15 minutes for the presentation itself and 5 minutes for Q/A). 2. All speakers of each session should meet the session chair at its room 10 minutes before the session begins. 3. All speakers have to check the session types to be divided into On-site, Hybrid (conducted both online and offline), Online. We will provide the ‘Zoom login detail’ of the Hybrid and Online session.

vii 2021 International Conference on Digital Contents: AICo(AI, IoT and Contents) Technology (DigiCon-21)

July 8, 2021 (Thursday)

Registration 10:30 Ocean Suites Jeju Hotel, 2nd Floor

Session C-2 Session A-1 Session B-1 (Poster) Hyodol-bot R&D Seminar Special Session “NSM-21” 11:00 - Hanshin University Chair: Young-Ae Jung 12:00 (Closed Session) Chair: Kun Chang Lee Online Zoom Meeting Room Please Check the Zoom login detail Chair: Yang-Sun Lee Cherry Hall to be provided later. Canola Hall

12:00- Lunch nd 13:30 BlueOcean (2 Floor) The place will be probably changed in accordance with the Corona Prevention Guidelines. If changed, we will inform you again. Session A-2 Session B-2 13:30 - Chair: Ki-Hong Park Chair: Ka Lok Man Cherry Hall 14:30 Online Zoom Meeting Room and Online Zoom Meeting Room Session C-3 Please Check the Zoom login detail Hyodol-bot R&D Seminar Please Check the Zoom login detail to be provided later. to be provided later. Hanshin University Session A-3 (Closed Session) Session B-3 Chair: Yang-Sun Lee 14:45 - Chair: Jeong-Dong Kim Canola Hall Chair: Guenjun Yoo Cherry Hall 15:45 Online Zoom Meeting Room and Online Zoom Meeting Room Please Check the Zoom login detail Please Check the Zoom login detail to be provided later. to be provided later.

15:45- Coffee Break 16:30 Ocean Suites Jeju Hotel, 2nd Floor Award Ceremony 16:30- Chair: Young-Ae Jung 17:00 Canola Hall and Online Zoom Meeting Room Invited Talk Best practice for building low-latency and high-volume APIs 17:00- with Serverless Cloud Computing 17:50 by Engineer, Mashhur Sattorov, Amazon, Canada Chair: Jae-Myeong Choi Canola Hall and Online Zoom Meeting Room

Closing Ceremony 17:50- Chair: Young-Ae Jung 18:00 Canola Hall and Online Zoom Meeting Room

th 18:30- DCS 20 Anniversary Ceremony 20:00 Canola Hall and Online Zoom Meeting Room Chair: Young-Chul Kim

1. A paper presentation should be made by one of authors of the paper, during a 20 minute time slot (15 minutes for the presentation itself and 5 minutes for Q/A). 2. All speakers of each session should meet the session chair at its room 10 minutes before the session begins. 3. All speakers have to check the session types to be divided into On-site, the Hybrid (conducted both online and offline), Online. We will provide the ‘Zoom login detail’ of the Hybrid and Online session. viii 2021 International Conference on Digital Contents: AICo(AI, IoT and Contents) Technology (DigiCon-21)

TECHNICAL SCHEDULE FOR DIGICON-21

July 7, 2021 (Wednesday)

09:00-10:00 Local Arrangement Meeting

Chair: Yang-Sun Lee (Hanshin University, Korea)

10:15-11:15 DCS & DigiCon-21 Steering Meeting

Chair: Yoon-Ho Kim (Mokwon University, Korea)

11:30-12:30 DigiCon-21 Conference Committee Meeting

Chair: Young-Ae Jung (Sun Moon University, Korea)

12:30-14:30 Lunch

BlueOcean (2nd Floor) (The place will be probably changed in accordance with the Corona Prevention Guidelines. If changed, we will inform you again.)

14:30 Registration open

Ocean Suites Jeju Hotel, 2nd Floor

14:30-18:00 Session C-1 Hyodol-bot R&D Seminar (Closed Session)

Room: Canola Hall Chair: Yang-Sun Lee (Hanshin University, Korea)

18:10-19:00 DigiCon-21 Committee & DCS Meeting

Room: Canola Hall Chair: Young-Ae Jung (Sun Moon University, Korea)

ix 2021 International Conference on Digital Contents: AICo(AI, IoT and Contents) Technology (DigiCon-21)

July 8, 2021 (Thursday)

10:30 Registration open

Ocean Suites Jeju Hotel, 2nd Floor

11:00 -12:00 Session A-1 Special Session - “NSM-21”

Room: Cherry Hall and Online Zoom Meeting Room Chair: Kun Chang Lee (Sungkyunkwan University, Korea)

BA-AM: An Advanced Prediction Model for Business Problem Solving Creativity Using fNIRS Min Gyeong Kim, Kun Chang Lee (Sungkyunkwan University, Korea) [Abstract] In this paper, deep-learning-based business problem-solving creativity (BPSC) classification using functional near-infrared spectroscopy (fNIRS) is investigated. The creative brain signals from problem-solving were acquired from 28 healthy subjects while drawing a cognitive map. The brain activities were measured with a constant-wave fNIRS system, in which the ventrolateral prefrontal, dorsolateral prefrontal, frontopolar prefrontal, and orbitofrontal cortices were focused. These cortex areas are based on Brodmann area, and each area is known to have a distinctive character. Brodmann Area-based Attention Model (BA-AM) is a Deep learning model powered by an attention mechanism and LSTM. BA-AM attempts to classify creativity, valence, and arousal. For training and testing the model, 5-fold cross-validation was applied to keep the model's consistent performance. The BA-AM architecture resulted in an average accuracy of 93.4%, showing the model to be capable of differentiating the states of creative/non-creative. The proposed approach is promising for detecting creativity successfully and contributes to future neuroscience mining research.

Predicting User’s Perceived Price Perception of e-commerce Products via Distinct UI Modes : A Neuroscience Mining Approach Francis Joseph Costello, Kun Chang Lee (Sungkyunkwan University, Korea) [Abstract] User interface (UI) modes are being considered more in application and design these days. Two years has passed since Android and iOS announced adoption of the dark UI mode allowing users to decide on their preferred UI mode. This paper investigates the use of dark and light mode UI in online commerce and attempts to understand users’ perceptions. We investigate this using EEG data obtained from participants and neuroscience mining techniques (NSM) to understand: (a) how consumers perceived price perceptions of products differ for distinct UI modes; and (b) how Neuromarkers of participants can explain these findings. Based on our results we found that the use of the dark UI mode increased the perceived price of the products shown and was predicted with the greatest accuracy using brain data obtained from the dark UI condition using an LSTM model. This paper is an example of the power on neuroscience mining in uncovering objective truths of people’s intentions while interacting with distinct UI designs.

11:00 - 12:00 Session B-1 ( Poster Session)

Room: Online Zoom Meeting Room Chair: Young-Ae Jung (Sun Moon University, Korea) Emotional Image Recommendation System Based on Biometric Information and Collaborative Filtering Tae-Yeun Kim , Young-Eun An (Chosun University, Korea) [Abstract] In this paper, we propose an image recommendation system that considers the user's emotions in the proposed system. We measure ECG and PPG, which are biological information of the user, and then use frequency analysis algorithm and SVM-GA algorithm implemented the recommended system and tried to improve the reliability of recommendation system by using collaborative filtering. We implemented a user friendly interface to verify the emotion information of the images through the recommended values and measurement graphs of the emotional images through the mobile application, thereby enhancing the usability of the proposed system. Experimental results showed that subjects' biometric information (ECG, PPG) were classified and learned by SVM-GA algorithm and classified into 4 kinds of emotion information according to biometric information. The average accuracy of the classified data was 89.2%. In addition, 86.7% of the users' satisfaction was measured, suggesting that the proposed emotion based search result is comparable to the emotion felt by a person. x 2021 International Conference on Digital Contents: AICo(AI, IoT and Contents) Technology (DigiCon-21)

A combination of uniform label training strategy and AutoAugment policies to identify the category of foliar diseases in apple trees Vo Hoang Trong, Le Hoang Anh, Lee JuHwan, Kim JinYoung (Chonnam National University, Korea) [Abstract] Applying a conventional training method on an imbalanced dataset makes the Convolutional Neural Network (CNN) model biased to majority classes. In this paper, we propose a uniform label training strategy, in which the distribution of labels in a single batch is uniform. This strategy allows the model to learn samples from minority classes frequently. Furthermore, to increase the variety of samples in minority classes, we apply the AutoAugment policies collected from the ImageNet dataset as a data augmentation technique, in which samples from minority classes have more opportunities using these techniques than those from majority classes. Finally, we experiment with the EfficientNet B0 on the Plant Pathology dataset from the Plant Pathology 2021 Challenge on . By applying the Learning Rate (LR) Range test to find the optimal learning rate in each model, results on the Kaggle scoreboard show that our strategy gets 0.74761 on Efficientnet B0.

Development of virtual reality fire training contents based on standalone HMD Eun-Jee Song, Hwa-Soo Jin (Namseoul University, Korea) [Abstract] Now, it is necessary to raise public safety awareness more than ever, and it is time to do disaster response training frequently. As a solution, the training system utilizing the virtual reality, which is the major technology of the 4th industrial revolution, is attracting attention. The virtual training system industry is an ICT convergence industry that implements computer simulations of virtual environments similar to defense, medical, and disaster scenes [1]. Existing fire safety education requires a lot of manpower and cost, and since the training is not practical, the students participating in the training have little immersion. In this paper, we propose a virtual training system for effective fire disaster response training. In particular, we develop fire training virtual reality contents using Oculus Go, a standalone HMD that can experience VR alone without a smartphone or computer.

Fingerprint-based Cooperative Beam Selection for Cellular-Connected mmWave UAV communication Sangmi Moon (Korea Nazarene University, Korea), Hyeonsung Kim, Intae Hwang (Chonnam National University, Korea) [Abstract] In this paper, we propose a fingerprint-based cooperative beam selection scheme for cellular-connected millimeter-wave (mm-wave) unmanned aerial vehicle (UAV) communication. The proposed scheme consists of offline fingerprint database construction and online beam cooperation. System-level simulations are performed to assess the UAV effect based on the 3rd generation partnership project new radio mmWave and UAV channel models. Simulation results show that the cooperative beam selection scheme can reduce the beam sweeping overhead and improve the signal-to-interference-plusnoise ratio and spectral efficiency.

11:00 - 12:00 Session C-2 Hyodol-bot R&D Seminar Hanshin University (Closed Session)

Room: Canola Hall Chair: Yang-Sun Lee (Hanshin University, Korea)

12:00 - 14:00 Lunch

BlueOcean (2nd Floor) (The place will be probably changed in accordance with the Corona Prevention Guidelines. If changed, we will inform you again.)

xi 2021 International Conference on Digital Contents: AICo(AI, IoT and Contents) Technology (DigiCon-21)

13:30 -14:30 Session A-2

Room: Cherry Hall and Online Zoom Meeting Room Chair: Ki-Hong Park (Mokwon University, Korea) Intelligent Infection Prevention Monitoring System Through CCTV Video Analysis around a Confirmed Patient Distance Dongsu Lee, Sang-Joon Lee (Chonnam National Univerisity, Korea), ASHIQUZZAMAN AKM (XISOM INC, Korea), Seungmin Oh, Jihoon Lee, Yeonggwang Kim, Sangwon Oh and Jinsul Kim (Chonnam National Univerisity, Korea) [Abstract] In this research, we use the K-means clustering technique and a deep learning-based crowd coefficient to group and group- specific infection rates on a confirmed person basis using CCTV images near new confirmed traffic lines in an intelligent anti-epidemic monitoring system. We show that PSNR is 21.51 and the final MAE for the entire dataset is 67.98 after 300 cycles of learning for all input images.

A Study on the Improvement of Resource Utilization Using Deep Learning Applications in Virtualized Platforms Seungmin Oh, Yeonggwang Kim, Sangwon Oh, Jinsul Kim (Chonnam National University, Korea) [Abstract] Recently, a lightweight virtualization technology based on Linux OS, known as Docker, has been used a lot in deep learning application services. Because deep learning applications require high-performance computing resources, research using computing resources efficiently is essential to providing services through virtualization platforms. This paper monitors CPU and GPU resources using cAdvisor and nvidia-smi when operating deep learning application fire/smoke detection model and fashion-MNIST classification model in the docker container. We also studied ways to improve resource utilization by comparing GPU resource usage with the time required for learning due to the optimization of deep learning applications. Through this, when operating a single deep learning application within a virtualization platform, it is used in parallel with multiple GPUs on average 88% more efficiently than a single GPU, and 19% more efficiently when operating multiple deep learning applications. Comparing the learning time performance of deep learning applications, a single GPU produced more efficient results when operating multiple deep learning applications.

A Study of VR Interactive Storytelling using Planning Su-ji Jang, Byung-Chull Bae (Hongik University, Korea) [Abstract] This paper proposes a VR interactive storytelling content using planning which automatically generates an action sequence to build a story. The generated content can provide two types of story endings depending on the player's choice. As a story material, we adopt the well-known fairy tale "Little Red Riding Hood" to implement the content. The players can select the VR story’s point of view – either a first-person or a third-person perspective. We expect that the player’s different viewpoints can contribute to the player’s immersion and emotional induction in the VR environment.

An Emotion Detection based Models using Electroencephalogram (EEG) Wajiha Ilyas , Anum Moin, Muzammil Noor, Maryam Bukhari (COMSATS University Islamabad, Attock Campus, Pakistan) [Abstract] To build the best communication between person to person or person to machine emotion plays an essential role. In this research, we propose a solution to detect human emotion using machine learning approaches with an electroencephalogram (EEG). These emotions are classified into four different classes. In the first step, the data is preprocessed in which frequency bands are extracted from Power Spectral Density (PSD) and Discrete Wavelet Transform (DWT) with three statistical features, Hjorth parameters, and two statistical features were extracted. For the classification purpose, we use multi-layer perceptron, decision tree, and KNN to classify the emotion. The dataset used in this work is the DEAP dataset in which the EEG signal of 32 participants was recorded. There are a total of 40 videos present in the dataset (60 seconds of each video). The classification accuracy of four classes by using multi- layer perceptron is 77.6 %( sad), 79.1% (calm), 76.8(anger), and 66.4 (happy). Similarly, by using the Decision tree we have achieved the 78.9%, 79.4%, 77.2%, and 68.5% accuracy while by using KNN the accuracy values are 90.2%, 90.3%, 89.1%, and 86.5% respectively.

xii 2021 International Conference on Digital Contents: AICo(AI, IoT and Contents) Technology (DigiCon-21)

13:30 -14:30 Session B-2

Room: Online Zoom Meeting Room Chair: Ka Lok Man (Xi’an Jiaotong-Liverpool University, China) A Trustable and Traceable Method for Secondhand Market based on Blockchain Technology with Committee Consensus Dongkun Hou, Yinhui Yi, Jie Zhang(Xi’an Jiaotong-Liverpool University, China), Ka Lok Man(Xi’an Jiaotong-Liverpool University, China, Swinburne University of Technology Sarawak, Malaysia, KU Leuven, Belgium, Kazimieras Simonavicˇius University, Lithuania, Vytautas Magnus University, Lithuania) [Abstract] The secondhand market is promising in recent years, but the trust problem hindered its development. Many research solves the problem by third-party intermediaries, and the trust is from third-party platform reputation. Some consumers, how- ever, have no idea to recognize the qualification of secondhand products, and it is difficult to safeguard rights if the third-party platform is malicious. In this paper, consortium blockchain technology is employed to solve these problems, and a novel committee based Byzantine consensus is designed in the secondhand market. Committee nodes can validate published transactions, other client nodes can also check the recorded transactions and update product states. The recorded transactions need to be validated by many committee nodes, and these transactions are transparent, permanent and traceable. Thus, buyers and sellers can trust the blockchain based secondhand market. Moreover, comparing with a public blockchain, our consortium blockchain based secondhand market system can be more secure and efficient.

A Feasibility Study of a Security Mechanism for Blockchain-Powered Intelligent Edge Jie Zhang(Xi’an Jiaotong-Liverpool University, China), Ka Lok Man(Xi’an Jiaotong-Liverpool University, Suzhou, China, Swinburne University of Technology Sarawak, Malaysia, imec-DistriNet, KU Leuven, Belgium, Kazimieras Simonavicius University, Lithuania, Vytautas Magnus University, Lithuania), Young B. Park(Dankook University, Korea), Jieming Ma, Xiaohui Zhu (Xi’an Jiaotong-Liverpool University, China) [Abstract] Blockchain-powered intelligent edge is a new paradigm which integrates blockchain and edge computing into the Intelligent Internet of Things (IIoT). It enables the edge-devices to process data for end-devices on behalf of the cloud center, and it also allows the cloud center to trace data via blockchain. It is expected to address some obstinate problems in many IoT applications, such as data management in intelligent building, data privacy in e-health, response delay in self-driving cars, etc.

Smart Contract-based Operation Technique for Secure Accessing Distributed Contents Yunhee Kang (Basekseok University, Korea), Young B. Park (Dankook University, Korea) [Abstract] In a cloud environment, a centralized content management naturally embeds security risks including personal information exposure [4]. In this paper, to solve the security risk, the original digital content is divided and encrypted, and then the system is designed for distributing it securely in a decentralized P2P storage system and then constructing a prototype. To achieve the goal, a digital content is encrypted using a symmetric key based encryption technique, and the encrypted content is stored in the IPFS storage. In this process, access information is managed by smart contracts. To access the distributed encrypted content, a smart contract is defined for delivering a secure digital content.

13:30 -15:45 Session C-3 Hyodol-bot R&D Seminar Hanshin University (Closed Session)

Room: Canola Hall Chair: Yang-Sun Lee (Hanshin University, Korea)

xiii 2021 International Conference on Digital Contents: AICo(AI, IoT and Contents) Technology (DigiCon-21)

14:45 -15:45 Session A-3

Room: Cherry Hall and Online Zoom Meeting Room Chair: Jeong-Dong Kim (Sun Moon University, Korea) Study on the Future Prediction Service Based on Ultra-Low Power Communication Application using the Edge Platform Yeonggwang Kim, Jinsul Kim (Chonnam National University, Korea) [Abstract] Along with A.I. technology, IoT technology has grown rapidly as a AIoT(AI of Things) technology. In addition, recognizing data values through sensors and performing deep learning tasks have become a reality. In this paper, we introduce the utilization of low- power communication protocols to reduce data loss using IoT sensors and techniques to store data values in the form of Cloud Database through web servers. The collected data is learned over the Artificial Neural Network, and we introduce an intelligent weather prediction monitoring system that predicts weather through this model.

Application of TadGAN for detecting Electric Power Usage Anomaly Sangwon Oh, Jinsul Kim (Chonnam National University, Korea) [Abstract] In modern society, energy use has increased exponentially compared to the past, regardless of household, commercial or business use. As a result, suppliers who provide power need to manage how users consume electric power. Most power data are time series data, and the emerging problem is anomaly detection. Therefore, it is necessary to detect anomalies in users' electirc power usage data in order to efficiently manage power. In this paper, we use the TadGAN model to detect outliers in power consumption. To evaluate the performance, we specify thresholds using the ARIMA model and set ground-truth, which considers values beyond them as outliers. We compared the number of TadGAN hyperparametric settings in nine cases with ground-truth, respectively, to yield F1-Score.

Fourier Random Features and LSTM for Healthcare Irregular Pattern Detection Ermal Elbasani, Jeong-Dong Kim (Sun Moon University, Korea) [Abstract] In this era of continuous innovations, the healthcare monitoring systems are changing continuously and becoming closer to an individual for more detailed monitoring. Wearable technologies are becoming a subcategory of the personal health system that provides continuous health monitoring through non-invasive biomedical, biochemical, and physical measurements. An important topic in health monitoring is to detect irregular pattern the correspond serious matter in taking care are raising awareness for individual health to prevent chronic disease. In this paper, the Random Fourier Features (RFF) model incorporates the Long-Short-Term Memory architecture into kernel learning, which significantly boosts the flexibility and richness of detection while keeping deep learning kernels methods to have a significant performance in different scale of data. The results indicate that the proposed model performs more effectively can yield a satisfactory accuracy in irregular pattern detection.

An Augmented Reality based System for Interior Designing Syeda Aqsa Fatima Rizvi, Mahnoor Javaid, Maryam Bukhari (COMSATS University Islamabad, Attock Campus, Pakistan) [Abstract] In today’s world, technology is developing gradually so is the urge of people to surround themselves with the latest tech gadgets. To accomplish this rate of demand, AR (Augmented Reality) is a mixture of real and virtual reality and is known for its splendid work in interior designing and making it easy for people to shop. Nowadays, it is facilitating a lot by creating an atmosphere where people can visualize the interior design models in their physical spaces virtually in real-world at their comfort zone by rotating, dragging, and zooming in and out of the interior designing models. All this can be done with the help of Marker-less-based Augmented Reality and ARCore. It is a platform for the design and deployment of Augmented Reality. With the help of this framework, people can sense the real environment but not all applications serve in the same way. Many other applications support the feature of augmented reality that is of no good for people to cherish for example SayDuck. ARCore framework is the solution for it in interior designing applications for purchasing desire items after looking at it virtually in their real environment.

xiv 2021 International Conference on Digital Contents: AICo(AI, IoT and Contents) Technology (DigiCon-21)

14:45 -15:45 Session B-3

Room: Online Zoom Meeting Room Chair: Geun-Jun Yoo (Sumo Logic) Video Anomaly Detection by the Combination of C3D and LSTM Yuxuan Zhao, Gabriela Mogos(Xi’an Jiaotong-Liverpool University China) , Ka Lok Man (Xi’an Jiaotong-Liverpool University, China, Swinburne University of Technology Sarawak, Malaysia, imec-DistriNet, KU Leuven, Belgium, Kazimieras Simonavicius University, Lithuania, Vytautas Magnus University, Lithuania) [Abstract] Video anomaly detection is a significant problem in computer vision tasks. It asks methods to detect unusual events in videos. The kernel of this task is to produce a correct understanding of the input video. To achieve this target, both spatial and temporal features are needed to be extracted by methods. Based on the research of image processing, the deep convolutional neural networks have been evaluated that they have good performance on the spatial feature extraction. Thus, the problem becomes how to get temporal features in the video. This paper proposes a model that combine two effective temporal features processing methods, Convolution 3D and Long Short-term Memory to handle the video anomaly detection. We do experiments on a famous video anomaly dataset, UCF-crime, and achieve a better performance compared with other methods.

A Study on Radar Target Detection and Classification Methodologies Yu Du (Xi’an Jiaotong-Liverpool University, China), Jeremy S. Smith (University of Liverpool, UK), Eng Gee Lim, Ka Lok Man (Xi’an Jiaotong-Liverpool University, China) [Abstract] Target detection and classification is the most important part of scenario analysis in autonomous driving and intelligent transportation systems (ITS). Moreover, robust all-weather sensing ability is one of the indispensable characteristics for the sensing elements in such systems, while radar has shown a great potential as the major sensor to handle this task which thanks to the environmental insensitivity of electromagnetic waves. However, unlike the informative images captured by cameras, reliable classification and detection of objects by radar sensor in real-time has proved to be quite challenging. To achieve the target, a comprehensive study on radar perception methodologies would be a good step stone. In this paper, major classical and the state of the art approaches are throughout summarized, and several potential deep neural network attempts have been discussed in comparison with traditional methodologies.

Modeling of Mechatronic Systems with Multi-Axis Liye Jia, Yujia Zhai (Xi’an Jiaotong-Liverpool University, China), Kejun Qian (Suzhou Power Grid Company, China), Sanghyuk Lee, Ka-Lok Man (Xi’an Jiaotong-Liverpool University, China) [Abstract] Recently, the market has rigorous requirements on accuracy and efficiency in the manufacturing industry, so the multi-axis system has been used in many advanced industrial devices to fit current demand. Using an independent servo-motor to actuate a single moveable axis is the basic principle of the multi-axis mechanism. However, the asynchronous problems will severely reduce the performance of this mechanic system. Therefore, to compensate for the synchronous error caused by external disturbances, the multi- axis should be simultaneously controlled by the synchronization control principles, and B&R Company provides the Evaluation and Training for Automation (ETA) workstation for testifying the control methods. Based on the practical structure of the ETA, this paper's exception is to create a digital dynamic model using MATLAB to replace the workstation with the Digital Twin technique, and the output of the virtual model can apply as the control signal to restrain the motion of the veritable mechanisms.

15:45- 16:30 Coffee Break

Ocean Suites Jeju Hotel, 2nd Floor

xv 2021 International Conference on Digital Contents: AICo(AI, IoT and Contents) Technology (DigiCon-21)

16:30- 17:00 Award Ceremony

Room: Canola Hall and Online Zoom Meeting Room Chair: Young-Ae Jung (Sun Moon University, Korea)

17:00-17:50 Invited Talk

Room: Canola Hall and Online Zoom Meeting Room Chair: Jae-Myeong Choi (Mokwon University)

Best practice for building low-latency and high-volume APIs with Serverless Cloud Computing by Engineer, Mashhur Sattorov, Amazon, Canada

17:50-18:00 Closing Ceremony

Room: Canola Hall and Online Zoom Meeting Room Chair: Young-Ae Jung (Sun Moon University, Korea)

18:30 –20:00 DCS 20th Anniversary Ceremony

Room: Canola Hal and Online Zoom Meeting Room Chair: Young-Chul Kim

xvi 2021 International Conference on Digital Contents: AICo(AI, IoT and Contents) Technology (DigiCon-21)

Table of Contents

BA-AM: An Advanced Prediction Model for Business Problem Solving Creativity Using fNIRS ...... 1 Min Gyeong Kim, Kun Chang Lee

Predicting User’s Perceived Price Perception of e-commerce Products via Distinct UI Modes : A Neuroscience Mining Approach ...... 5 Francis Joseph Costello, Kun Chang Lee

Emotional Image Recommendation System Based on Biometric Information and Collaborative Filtering ...... 9 Tae-Yeun Kim , Young-Eun An

A combination of uniform label training strategy and AutoAugment policies to identify the category of foliar diseases in apple trees ...... 12 Vo Hoang Trong, Le Hoang Anh, Lee JuHwan, Kim JinYoung

Development of virtual reality fire training contents based on standalone HMD ...... 16 Eun-Jee Song, Hwa-Soo Jin

Fingerprint-based Cooperative Beam Selection for Cellular-Connected mmWave UAV communication ...... 19 Sangmi Moon, Hyeonsung Kim, Intae Hwang

Intelligent Infection Prevention Monitoring System Through CCTV Video Analysis around a Confirmed Patient Distance ...... 23 Dongsu Lee, Sang-Joon Lee, ASHIQUZZAMAN AKM, Seungmin Oh, Jihoon Lee, Yeonggwang Kim, Sangwon Oh, Jinsul Kim

A Study on the Improvement of Resource Utilization Using Deep Learning Applications in Virtualized Platforms...... 26 Seungmin Oh, Yeonggwang Kim, Sangwon Oh, Jinsul Kim

A Study of VR Interactive Storytelling using Planning ...... 30 Su-ji Jang, Byung-Chull Bae

An Emotion Detection based Models using Electroencephalogram (EEG)...... 33 Wajiha Ilyas , Anum Moin, Muzammil Noor, Maryam Bukhari

A Trustable and Traceable Method for Secondhand Market based on Blockchain Technology with Committee Consensus ...... 37 Dongkun Hou, Yinhui Yi, Jie Zhang, Ka Lok Man

xvii 2021 International Conference on Digital Contents: AICo(AI, IoT and Contents) Technology (DigiCon-21)

A Feasibility Study of a Security Mechanism for Blockchain-Powered Intelligent Edge ...... 41 Jie Zhang, Ka Lok Man, Young B. Park, Jieming Ma, Xiaohui Zhu

Smart Contract-based Operation Technique for Secure Accessing Distributed Contents...... 45 Yunhee Kang, Young B. Park

Study on the Future Prediction Service Based on Ultra-Low Power Communication Application using the Edge Platform ...... 49 Yeonggwang Kim, Jinsul Kim

Application of TadGAN for detecting Electric Power Usage Anomaly ...... 52 Sangwon Oh, Jinsul Kim

Fourier Random Features and LSTM for Healthcare Irregular Pattern Detection ...... 56 Ermal Elbasani, Jeong-Dong Kim

An Augmented Reality based System for Interior Designing ...... 59 Syeda Aqsa Fatima Rizvi, Mahnoor Javaid, Maryam Bukhari

Video Anomaly Detection by the Combination of C3D and LSTM ...... 63 Yuxuan Zhao, Gabriela Mogos, Ka Lok Man

A Study on Radar Target Detection and Classification Methodologies ...... 66 Yu Du, Jeremy S. Smith, Eng Gee Lim, Ka Lok Man

Modeling of Mechatronic Systems with Multi-Axis ...... 71 Liye Jia, Yujia Zhai, Kejun Qian, Sanghyuk Lee, Ka-Lok Man

xviii 2021 International Conference on Digital Contents: AICo(AI, IoT and Contents) Technology (DigiCon-21)

BA-AM: An Advanced Prediction Model for Business Problem Solving Creativity Using fNIRS

Min Gyeong Kim Kun Chang Lee SKK Business School SKK Business School/ Department of Health Sciences & Sungkyunkwan University Technology, SAIHST (Samsung Advanced Institute for Seoul, South Korea Health Sciences & Technology) mgkim9541 @.com Sungkyunkwan University Seoul, South Korea [email protected]

Abstract— In this paper, deep-learning-based business problem- predicts the intensity of negative and positive emotions using solving creativity (BPSC) classification using functional near-infrared fNIRS data. spectroscopy (fNIRS) is investigated. The creative brain signals from problem-solving were acquired from 28 healthy subjects while II. RELATED WORK drawing a cognitive map. The brain activities were measured with a constant-wave fNIRS system, in which the ventrolateral prefrontal, A. BPSC dorsolateral prefrontal, frontopolar prefrontal, and orbitofrontal In a business environment, creativity leads to the same cortices were focused. These cortex areas are based on Brodmann area, context as organizational decision-making when an organization and each area is known to have a distinctive character. Brodmann proposes, materializes, and attempts an idea. Therefore, BPSC, Area-based Attention Model (BA-AM) is a Deep learning model which has an important effect on solving business problems, is powered by an attention mechanism and LSTM. BA-AM attempts to known as a critical process for creating competitive advantage classify creativity, valence, and arousal. For training and testing the [2, 3]. In this paper, BPSC and fNIRS data used in this study are model, 5-fold cross-validation was applied to keep the model's based on the following paper [4]. In this paper, the authors consistent performance. The BA-AM architecture resulted in an evaluate the business problem-solving capabilities that occur average accuracy of 93.4%, showing the model to be capable of during the creation of recognition maps. The evaluation was differentiating the states of creative/non-creative. The proposed conducted by a business administration doctor who studied approach is promising for detecting creativity successfully and contributes to future neuroscience mining research. creativity science and used the Torrance Tests of Creativity Thinking (TTCT) method to score on four scales. The four Keywords—BPSC; fNIRS; BA-AM; deep learning; scales of TTCT are (1) Fluency: number of interpretable and neuroscience mining; valence; arousal meaningful ideas generated in response to stimuli; (2) Flexibility: number of different categories in response to stimuli; (3) I. INTRODUCTION Originality: number of ideas in response to stimuli; (4) This study predicts the subject's creativity level and Elaboration: stimulus. Consequently, an entity allocated five emotional state through hemodynamic deep learning analysis points per scale to assess a total of 20 points. based on experimental research data on a new level of creativity B. Brodmann Area called BPSC (Business Problem Solving Creativity) for creative In this work, we want to identify the functional differences problem-solving in a business environment. Here, between zones through the Brodmann region. The Brodmann hemodynamics means oxygenated hemoglobin and deoxy- region was defined by German anatomist Brodmann based on hemoglobin. HbO2 (oxy-haemoglobin) is relatively higher than neurons in relation to the various functions of the cortex HbR (deoxy-haemoglobin), which means brain recognition (Brodmann, 1909). Thus, in this study, we divided the channels making decisions or searching for ideas. It can be interpreted to to Frontopolar cortex (Brodmann area 10), Ventral prefrontal mean that it is more active from the side. By applying these cortex (Brodmann area 45), Dorsolateral prefrontal cortex neuroscience research methods, we will explore the cognitive (Brodmann area 9), and Orbitofrontal cortex (Brodmann area relationship between BPSC and brain cognitive changes based 11).

shows a summary by location and functionality on Brodmann Areas [1]. of Brodmann Area 10, Brodmann Area 42, Brodmann Area 9), Furthermore, we check whether the attention model based on Brodmann Area 11). Brodmann's area effectively predicts creativity and emotional state. As a neuroimaging measurement tool, near-infrared spectroscopy (fNIRS: Functional near-infrared spectroscopy) was used. This study aims to examine whether BPSC can be effectively predicted based on data collected using fNIRS data. We also determine whether the deep learning model successfully

1 2021 International Conference on Digital Contents: AICo(AI, IoT and Contents) Technology (DigiCon-21)

TABLE I. BRODMANN AREA, FUNCTIONALITY

Brodmann area Contents Brodmann area 10 Brodmann area 45 Brodmann area 9 Brodmann area 11

Imgae

a d b c Cortex s Ventrolateral prefrontal cortex Dorsolateral prefrontal cortex Orbitofrontal cortex Cognitive branching function, that Auxiliary role in evoking Inductive reasoning and the core is, when performing multiple tasks Important role in maintaining semantically relevant working function of inferring the intentions at the same time or processing concentration and managing Functionality memory through criteria of others in a given situation [7] complex tasks, a core function [5] working memory. [8] appropriate to a given situation [6] ∙ Chevrier, Noseworthy ∙ Koechlin and ∙ Courtney(2004) Levy and Wagner(2011) and Schachar(2007) Hyafil(2007) a. Figure source: https://psychology.wikia.org/wiki/Brodmann_area_10 ; b. https://psychology.wikia.org/wiki/Brodmann_area_45 c. https://psychology.wikia.org/wiki/Brodmann_area_9 ; d. https://psychology.wikia.org/wiki/Brodmann_area_11

(a) Source detector (c) Brodmann area

(b) 30mm sensor

Fig. 1. (a) fNIRS device source detector; (b) 30mm sensor; (c)30mm sensor with Brodmann area C. fNIRS students whose hands were mostly right-handed. The number of The measurement site of fNIRS is PFC, which greatly affects participants in the experiment included 28 healthy adults (mean human emotions and decision-making [9] and is also called an age = 23.89, SD = 1.641; 15 females) between the ages of 20 executive function because it is a decision-making and planning and 27. Participants in the experiment conduct a survey on their function. In particular, in decision-making, the PFC and Limbic emotional state before the start of the experiment (initial valence system are involved in thinking, reasoning, calculation, and and arousal). Then, according to the guidance, participants solve emotional aspects [10,11] (Koechlin, Ody and Kouneiher, 2003; the given business problem and draw a cognitive map in the Ernst and Paulus, 2005; Cieslik et al., 2012). The fNIRS data in process. Cognitive guidance is assessed by experts on a this study were collected using OBELAB's NIRSIT[12], which Creativity Score, which is the BPSC. After the end of the is transcriptional measurement equipment for FPC. In this study, experiment, the emotional state is investigated again (final only the 30mm sensor was used, and detailed information about valence and arousal) the fNIRS device and the Brodman area used in the study is Next, BA-AM used for classification is a multi-label shown in Figure 1[13]. classifier. Therefore, in addition to the commonly used accuracy, precision, recall, and f1score, additional metrics are required. III. METHODOLOGY AND RESULT The most widely used metrics in multi-label analysis are The characteristics of the collected data are as follows. For hamming loss and exact match. Hamming loss is the same as (1), the measurement of cerebral blood flow dynamics, participants and exact match is the same as (2). who took blood pressure medications, lack of sleep or alcohol | | | | = xor , (1) the day before the experiment, and had a history of brain damage | | | | , , or psychiatric disorders were judged inappropriate to participate 1 𝑁𝑁 𝐿𝐿 𝑁𝑁 ⋅ 𝐿𝐿 𝑖𝑖=1 𝑗𝑗=1 𝑖𝑖 𝑗𝑗 𝑖𝑖 𝑗𝑗 in the experiment and were restricted from participating. In 𝐻𝐻𝐻𝐻𝐻𝐻𝐻𝐻𝐻𝐻𝐻𝐻𝐻𝐻 𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿 ∑| | ∑ �𝑦𝑦 𝑧𝑧 � = I( = ) (2) addition, it was composed of students of Korean nationality and | | 1 𝑁𝑁 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀ℎ 𝑁𝑁 ∑𝑖𝑖=1 𝑌𝑌𝑖𝑖 𝑍𝑍𝑖𝑖

2 2021 International Conference on Digital Contents: AICo(AI, IoT and Contents) Technology (DigiCon-21)

A. BA-AM TABLE II. CLASSIFICATION PERFORMANCE BA-AM was created by referring to the conceptual elements of the Hierarchical Attention Network (HAN) model [14]. The Next, Table 3 shows the multi-label classification HAN model is characterized by being composed of several performance. Looking at Hamming Loss, which is known as the LSTM or GRU layers. In general, HAN is used in text analysis, main error performance of multi-label, first, the error of BA-AM and it has strength in analyzing by expanding the hierarchy to is the smallest (0.01612), followed by LSTM (0.06548), HAN sentence unit and paragraph unit. In this study, before analyzing brain regions, the left and right hemispheres of each Brodmann area were divided and trained in each LSTM model, and a Prediction Performance for the BPSC Brodmann Area-based Attention Model(BA-AM) was Model Accuracy% Precision% Recall% F1-score% developed as a method of attention. The conceptual diagram is as follows (see Fig.2.). CNN 84.44 84.89 84.44 84.49 LSTM 87.75 87.87 87.75 87.52

HAN 86.62 89.01 86.72 87.02

BA-AM 92.74 93.38 92.74 92.79

Prediction Performance for the Initial Valence (IV) Model Accuracy% Precision% Recall% F1-score%

CNN 94.12 94.61 94.12 94.05

LSTM 94.86 95.40 94.86 94.93

HAN 93.37 93.85 92.10 93.01 Fig. 2. BA-AM Structure BA-AM 99.80 99.73 99.82 99.77

BA-AM divides 68 channels into 8 zones and configures the Prediction Performance for the Initial Arousal (IA) first layer by 8 individual Bidirectional LSTMs. Each model has Model Accuracy% Precision% Recall% F1-score% the left and right hemispheres of VPC, FPC, DPC, and OC as CNN 95.00 95.08 95.00 94.99 input values. Since the number of channels in each cell is different, each model has its own number of nodes, and finally LSTM 94.90 95.04 94.90 94.86 each region goes through an attention layer. The attention layer reduces the bias of each node and prevents overfitting or HAN 92.72 93.04 92.27 92.62 underfitting. Then, the concatenate layer merges 8 attention layers into one, and it becomes the start of the second analysis BA-AM 99.80 99.76 99.81 99.79 layer. In the second layer, Bi-LSTM and attention mechanisms Prediction Performance for the Final Valence (FV) are applied, and finally, 5 kinds of output are predicted. Model Accuracy% Precision% Recall% F1-score% B. Result CNN 88.11 90.05 88.11 88.11 The purpose of this experiment is to confirm the performance of BA-AM. For this, CNN, LSTM, and HAN LSTM 94.86 95.55 94.86 94.04 models, which are known to have excellent analysis performance, are used as benchmarks. Each CNN and LSTM HAN 90.46 91.29 90.46 90.49 model is designed to enable multi-input and multi-output with functional keras API. Similarly, the HAN model was also used BA-AM 99.76 99.78 99.71 99.75 as a multi-label classifier, and the main algorithms for Prediction Performance for the Final Arousal (FA) constructing HAN are Bidirectional LSTM and attention Model Accuracy% Precision% Recall% F1-score% mechanism. Table 2 shows the classification performance of the model. According to the results in Table 2, the classifier that best CNN 91.73 93.00 91.73 91.91 classifies BPSC was BA-AM (92.74%). After that, LSTM, HAN, and CNN followed. It can be interpreted to prove that it is LSTM 94.89 93.52 90.74 91.95 meaningful to divide brain regions and regions for creativity classification. Next, the prediction results of initial valence and HAN 94.17 93.40 92.27 92.33 arousal are as follows. BA-AM showed the best performance BA-AM 99.84 99.78 99.88 99.83 (IV: 99.80, IA: 99.80), and HAN showed the poorest performance (IV: 93.37, IA: 92.72). Finally, CNN showed the (0.08532), and CNN (0.09323). What Hamming-loss means is most inferior performance in FA and FV prediction. These that misclassified labels are calculated among all labels, and results imply that input to HAN without data alignment does not since these results are based on each label, the small error improve predictive performance as expected. indicates that the model is making a successful prediction for

3 2021 International Conference on Digital Contents: AICo(AI, IoT and Contents) Technology (DigiCon-21) each label. Next, the exact match is a metric that means when all REFERENCES five labels are correctly matched. Therefore, it can be seen that [1] K. Brodmann, “Vergleichende lokalisationslehre der grobhirnrinde. the CNN model correctly classified all five labels in 66.923% of In Vergleichende lokalisationslehre der grobhirnrinde”, 1909, pp. 324- the samples and that BA-AM correctly classified 92.58% of the 324. labels. [2] J. Zhou, and C. E. Shalley, “Deepening our understanding of creativity in the workplace: A review of different approaches to creativity TABLE III. MULTI-LABEL PERFORMANCE research”. APA handbook of industrial and organizational psychology, Vol 1: Building and developing the organization., 2011, 275-302. [3] N. Anderson, C. K. De Dreu, and B. A. Nijstad, “The routinization of Multilabel Classification Performance innovation research: A constructively critical review of the state‐of‐the‐ Model Hamming Loss Exact Match(%) science”. Journal of organizational Behavior, 2004, 25.2: 147-173. [4] J.K. Ryu, K.C. Lee, “Neuroscience-based Exploratory Approach to CNN 0.09323 66.923 Measuring the Business Problem-solving Creativity from the Perspective of SIAM(Search for Ideas Associative Memory) Model : Emphasis on LSTM 0.06548 84.797 fNIRS(functional near-infrared spectroscopy) Method”, Korean Academic Society Of Business Administration, 2018, pp. 1111-1137 HAN 0.08532 79.451 [5] E. Koechlin, A. Hyafil, “Anterior prefrontal function and the limits of human decision-making”. Science, 2007, pp. 594-598. BA-AM 0.01612 92.580 [6] B. J. Levy, A. D. Wagner, “Cognitive control and right ventrolateral prefrontal cortex: reflexive reorienting, motor inhibition, and action IV. CONCLUSION updating.” Annals of the New York Academy of Sciences, 2011, pp 40. [7] A. D. Chevrier, M. D. Noseworthy, and R. Schachar, “Dissociation of In this paper, we explored creativity, valence, and arousal response inhibition and performance monitoring in the stop signal task classification methods using fNIRS data. Based on the fNIRS using event‐related fMRI.” Human brain mapping, 2007, pp. 1347-1358. study, we collected prefrontal oxygen saturation and creativity [8] S. M.Courtney, “Attention and cognitive control as emergent properties and emotional data from participants. As a result of training the of information representation in working memory.” Cognitive, Affective, deep learning model by dividing the fNIRS data collected by the & Behavioral Neuroscience, 2004, pp. 501-516. experiment according to the Brodmann area, it was confirmed [9] Koechlin, Etienne, C. Ody, and F. Kouneiher. "The architecture of that the prediction performance was better than that of the other cognitive control in the human prefrontal cortex." Science 302.5648 ,2003, pp. 1181-1185. model. These results showed excellent performance when [10] Ernst, Monique, and M. P. Paulus. "Neurobiology of decision making: a compared with widely used deep learning models (CNN, LSTM, selective review from a neurocognitive and clinical HAN), and revealed that the approach based on the different perspective." Biological psychiatry, 2005, pp. 597-604. functions performed by brain regions is also meaningful for deep [11] J. K. Choi, J. M. Kim, G. Hwang, J. Yang, M. G. Choi, and H. M. Bae, learning analysis. This research has enough room for further “Time-Divided Spread-Spectrum Code-Based 400 fW-Detectable development by being grafted onto more neuroscience mining Multichannel fNIRS IC for Portable Functional Brain Imaging,” IEEE J. research in the future, and it is meaningful in that it can be Solid-State Circuits, pp. 484-495. applied to EEG research such as high resolution fNIRS, fMRI, [12] J. Shin, J. Kwon, J. Choi, and C.H. Im, “Performance enhancement of a and EEG. brain-computer interface using high-density multi-distance NIRS”. Sci. Rep.,2017, 7:16545 ACKNOWLEDGMENT [13] Z. Yang, D. Yang, C. Dyer, X. He, A. Smola, and E. Hovy, “Hierarchical attention networks for document classification.” In Proceedings of the This work was supported by the National Research Foundation 2016 conference of the North American chapter of the association for of Korea(NRF) grant funded by the Korea government(MSIT) computational linguistics: human language technologies , 2016 ,pp. 1480- (No. 2020R1F1A1074808). 1489. .

4 2021 International Conference on Digital Contents: AICo(AI, IoT and Contents) Technology (DigiCon-21)

Predicting User’s Perceived Price Perception of e- commerce Products via Distinct UI Modes: A Neuroscience Mining Approach

Francis Joseph Costello Kun Chang Lee SKK Business School SKK Business School/ Department of Health Sciences & Sungkyunkwan University Technology, SAIHST (Samsung Advanced Institute for Seoul, South Korea Health Sciences & Technology) [email protected] Sungkyunkwan University Seoul, South Korea [email protected]

Abstract—User interface (UI) modes are being considered more for the field of IS whereby such research can inform on IS in application and software design these days. Two years has passed usage within areas such as e-commerce. As most people since Android and iOS announced adoption of the dark UI mode interact with their smart devices on a regular basis, allowing users to decide on their preferred UI mode. This paper understanding behavioral responses (i.e., online shopping) to investigates the use of dark and light mode UI in online commerce the use of distinct UI designs can help to inform decision- and attempts to understand users’ perceptions. We investigate this makers on the way to approach their future design. using EEG data obtained from participants and neuroscience mining techniques (NSM) to understand: (a) how consumers perceived price perceptions of products differ for distinct UI modes; and (b) how A. User Inetrface Design Neuromarkers of participants can explain these findings. Based on UI has been shown to be linked directly to a users’ our results we found that the use of the dark UI mode increased the experience. UI has also been shown to encourage users to use perceived price of the products shown and was predicted with the their smart devices more. Such notions have been expressed greatest accuracy using brain data obtained from the dark UI predominantly in the marketing field where colors in UI design condition using an LSTM model. This paper is an example of the have been shown to motivate, persuade, and create a bond power on neuroscience mining in uncovering objective truths of between the consumer and the given product/service. Research people’s intentions while interacting with distinct UI designs. has shown that this can lead to a higher purchase rate [6]. Furthermore, specific colors have been shown to evoke specific Keywords—user interface design; percieved perceptions; e- moods. For example, the color green is considered to be related commerce; neuroscience mining to emotions of peace, whereas the color red can evoke both positive emotions of warmth and negative emotions of alert [7]. I. INTRODUCTION

Since 2019, consumers of Apples’ IOS 13 and Androids’ 10 allowed the option to switch between two UI designs: "Dark mode" and "Light mode" (see Fig.1). In particular, the Dark mode UI (hereafter DUI) has started to gain more prominence amongst users. Evidence of this was seen when Google conducted a survey and according to one team developer, 82.7% of the 243 responders preferred the dark mode [1]. More recent evidence suggests that this trend is likely to continue moving forward, potentially leaving the traditional light mode UI (hereafter LUI) behind [2]. Due to the novelty of this UI design, comparative advantages and disadvantages compared with the LUI are not well established in business, nonetheless, research surrounding the benefits of the DUI’s contrast has been explored long before the official release from both Apple and Android. Aspects considered in this prior research ranged from legibility, aesthetics, energy-saving potentials, semantic effects, emotions, fatigue, and visual acuity [2–5]. Thus, as can be see, a greater examination on the effects of both UI modes is Fig. 1. Example of the two UI designs. needed within digital business setting. This is especially true

5 2021 International Conference on Digital Contents: AICo(AI, IoT and Contents) Technology (DigiCon-21)

Dark vs. Light UI Mode and Percieved Perceptions Prior research has found that dark versus light backgrounds in photography evoked various behavioral approaches toward politicians who were shown in photos. It was found that dark backgrounds evoked a negative attitude, whereas LUI evoked a positive attitude towards politicians [8]. Work by Sullivan [9] portrayed books with dark versus light backgrounds and found that participants were less likely to assess the book “to be a Fig. 2. Dark UI and light UI stimuli used to represent online iPad shopping work of genius” when the book was depicted in the alongside the dark background. These types of framing effects can also impact upon consumers. Studies have shown that consumer In this study we analyzed EEG data using the Emotive behavior towards advertisements and product perceptions when EPOC+ Model 2.0 (Fig. 3). It is a 14-channel wireless headset framed in a different way can help to increase consumption with a sampling rate of 128Hz. electrodes are located based on [10]. 10-20 reference positions: AF3, F7, F3, FC5, T7, P7, O1, O2, P8, T8, FC6, F4, F8, AF4. The electrode signals were passed As seen, extant literature has found that dark colors can be through a 0.16 Hz cut-off frequency high-pass filter, pre- a proxy for certain framing effects (both positive and negative). amplifier, and 83 Hz cut-off low-pass filter. The analog signals These include higher perceptions of aspects such as luxury and were then digitized at 2048 Hz, filtered using a digital 5th- expensiveness [11]. The characteristic of luxury is often order sync filter (50-60 Hz), then low pass filtered, and down- measured by perceived price [12]. Therefore, we posit that sampled to 128 Hz. Next, these signals were then Fourier when a consumer views a product online with the DUI used, Transformed (Hamming Window). Lastly, alpha band was they will perceive the product to be more luxurious and extracted from the data. expensive looking, thus we hypothesize:

H1. Participants in the dark UI condition will have a higher perceived price of products. Research has shown that the brain can be a good predictor of motivations, especially for behavior such as showing approach motivations towards products. For example, prior studies have found that the EEG alpha band is positively correlated with pleasantness judgments toward products when subjects were exposed to commercial content [13]. Following this research by Vecchiato et al. [13], we wanted to study the EEG alpha band to test such notions. Using neuroscience mining techniques, we should see an increased prediction performance towards perceived price within the participants brain data. Thus, we hypothesize the following: Fig. 3. EEG device and channel placement. H2. EEG alpha band activations within participants conditioned in the dark UI group will have a higher prediction B. Neuroscience Mining Models Used accuracy for perceived price. After the EEG data was preprocessed, this data was used II. METHODOLOGY for neuroscience mining. Neuroscience mining helps to explore Thirty-one South Korean graduate and undergraduate unseen patterns in neurobiological data in an attempt to predict students (mean age = 24.61; SD = 3.13; 10 females) joined the the validity of this data. Usually, neuro data is explored based experiment for payment after seeing a notice that was posted on regions of interest and traditional statistics, however, this on the university homepage notice page. Furthermore, neglects that the brain is known to work in a synchronized permission was obtained from the IRB ethical board before any fashion in multiple areas during many tasks. By employing experiments took place. All the participants were right-handed neuroscience mining it allowed for an unbiased analysis of the and had no issues with their eyesight. The participants were participants while they undertook the experiment. We used randomly assigned to one of two conditions: DUI or LUI. A machine learning and deep learning methods to help explore 29-inch computer screen was used to present the stimuli in a the data from each condition and find out which one has the tablet shape design to imitate shopping on a tablet. For running most predictive power. More specifically, we used the the stimuli on a computer, we used the PsychoPy software [14]. following models in our analysis: Naïve Bayes (NB), Bayesian Each condition had the same stimuli design except for the Network (BN), Decision Tree (DT), Random Forest (RF), distinct UI types. In total four product categories (fashion, Logisitc (LOG), Support Vector Machine (SVM), k-Nearest electronic home devices, computer, jewelry) were used for Neighbor (kNN), Convolution Neural Network (CNN), Long testing (See Fig. 2). Short-Term Memory (LSTM), Bi-LSTM, Hierarchical Attention Network LSTM (HAN) (for a description please refer to [15]).

6 2021 International Conference on Digital Contents: AICo(AI, IoT and Contents) Technology (DigiCon-21)

To help remove bias, each deep learning model was TABLE I. formatted the exact same in terms of numbers of layers and Prediction Performance for the Dark UI number of nodes at each layer. Furthermore, the activation Model Accuracy% Precision% Recall% F1-score% function was set to ReLu for all network layers before a Softmax activation function was used for the output layer. NB 29.16 28.16 91.80 43.10 Furthermore, all models were fed training and validation data BN 36.28 37.37 36.97 37.15 for 50 epochs, before being fed test data for the final validation of the model’s performance. Furthermore, to remove bias in the DT 56.59 57.95 58.19 58.06 data, all models were tested using 10-fold cross validation. RF 81.77 80.10 85.69 82.80

LOG 33.27 31.08 41.43 35.52 III. RESULTS SVM 29.16 31.34 38.87 06.89 The first analysis was conducted to find out the differences in perceived price between the two UI modes. This was done CNN 37.17 35.11 33.63 28.87 using the independent sample t-test model within the SPSS LSTM 92.13 92.03 92.09 92.06 software. As seen in Fig. 4, we found a significant difference between both UI modes. For fashion: DUI mode (M = Bi-LSTM 94.60 94.50 94.57 94.54 209115.03, SD = 537885.8) compared to the LUI mode (M = HAN 80.24 80.48 79.70 80.00 62368.3, SD = 61333.48) conditions; t (463) = -4.067, p = .000. For electronic home devices: DUI mode (M = Prediction Performance for the Light UI 611993.97, SD = 922119.3) compared to the LUI mode (M = Accuracy% Precision% Recall% F1-score% 328144.98, SD = 369676.13) conditions; t (463) = -4.304, p NB 21.22 20.63 01.24 02.34 = .000. For computer: DUI mode (M = 440157.41, SD = 787919.32) compared to the LUI mode (M = 279124.26, SD = BN 30.12 26.82 49.89 34.87 394424.63) conditions; t (463) = -2.759, p = .006. For jewelry: DT 53.96 52.98 53.08 53.03 DUI (M = 958815.54, SD = 1265873.53) compared to the LUI mode (M = 608631.15, SD = 692352.86) conditions; t (463) = RF 81.95 83.27 82.31 82.78 -3.667, p = .000. LOG 30.41 33.77 01.73 03.29

SVM 29.98 85.18 00.15 00.35

CNN 31.01 29.73 27.39 20.80

LSTM 85.02 85.48 84.75 85.01

Bi-LSTM 66.95 71.45 65.78 67.11

HAN 49.57 50.80 48.29 48.67

t-test between Bi-LSTM and LSTM performance

Dark (Bi-LSTM) Light (LSTM) t-value

94.60 85.02 2.49**

***p<0.01

IV. DISCUSSION

Fig. 4. Comparison of perceived price for both UI modes In this paper, we have explored the perceived price perception of consumers when manipulated into two conditions: dark UI mode (DUI) and light UI mode (LUI). Next, we present the results of the neuroscience mining Based on an EEG study, we collected neural signals of investigation. Each UI mode was separated for analysis so that participants as well as their behavioral data. We found a comparison could be made. As can be seen, the brain data significant support for the notion that the DUI improves the from the participants that were conditioned in the DUI showed perceived price of products when consumers view them in overall higher predictive success of price. As seen in Table 1, an e-commerce setting. Next, we employed a novel Bi-LSTM for the DUI showed the greatest accuracy of all the neuroscience methodology to analyze the neural data models, and when compared with the light mode’s most obtained from the EEG study. We found that using a Bi- accurate results from the LSTM model, it was shown to be LSTM model we were able to significantly predict statistically more significant using a t-test (p<0.01). Thus, the increased price perceptions based on the neural data neuroscience mining results added further evidence of the obtained from the participants manipulated into the DUI effect of the DUI and perceived price. condition. This was confirmed with a significant comparison with the most accurate model for the LUI

7 2021 International Conference on Digital Contents: AICo(AI, IoT and Contents) Technology (DigiCon-21)

condition using a t-test. Overall, both results strongly [7] F. Hawlitschek, L.E. Jansen, E. Lux, T. Teubner, C. suggest that DUI aids in increasing the perceived price of Weinhardt, Colors and trust: The influence of user products and thus should be considered by online marketers, interface design on trust and reciprocity, Proceedings of especially if they are trying to sell hedonic products, such the Annual Hawaii International Conference on System as jewelry and computers. Sciences. 2016-March (2016) 590–599. https://doi.org/10.1109/HICSS.2016.80. ACKNOWLEDGMENT [8] C. von Sikorski, The Effects of Darkness and Lightness This work was supported by the National Research Foundation Cues in the Visual Depiction of Political Actors Involved of Korea(NRF) grant funded by the Korea government(MSIT) in Scandals: An Experimental Study, Communication (No. 2020R1F1A1074808). Also, authors would like to thank Research Reports. 35 (2018) 162–171. the assistance of Rita Tolchinski during the EEG experiment. [9] K. Sullivan, Judging a book by its cover (and its We also want to thank her insights into the developing the background): effects of the metaphor intelligence is ideas seen within this paper. brightness on ratings of book images, Visual Communication. 14 (2015) 3–14. REFERENCES [10] C. Chang, Ad framing effects for consumption products: [1] T. Steiner, Let there be darkness, Medium. (2019) 10. An affect priming process, Psychology and Marketing. 25 https://medium.com/dev-channel/let-there-be-darkness- (2008) 24–46. maybe-9facd9c3023d. [11] M.M. Aslam, Are you selling the right colour? A cross- [2] L.A. Pedersen, S.S. Einarsson, F.A. Rikheim, F.E. cultural review of colour as a marketing cue, Journal of Sandnes, User Interfaces in Dark Mode During Daytime – Marketing Communications. 12 (2006) 15–30. Improved Productivity or Just Cool-Looking?, in: Lecture [12] Y.J. Han, J.C. Nunes, X. Drèze, Signaling Status with Notes in Computer Science (Including Subseries Lecture Luxury Goods: The Role of Brand Prominence, Journal of Notes in Artificial Intelligence and Lecture Notes in Marketing. 74 (2010) 15–30. Bioinformatics), Springer International Publishing, 2020: [13] G. Vecchiato, J. Toppi, L. Astolfi, F. de Vico Fallani, F. pp. 178–187. Cincotti, D. Mattia, F. Bez, F. Babiloni, Spectral EEG frontal asymmetries correlate with the experienced [3] A. Buchner, S. Mayr, M. Brandt, The advantage of positive text-background polarity is due to high display pleasantness of TV commercial advertisements, Medical luminance, Ergonomics. 52 (2009) 882–886. and Biological Engineering and Computing. 49 (2011) [4] D. Löffler, L. Giron, J. Hurtienne, Night Mode, Dark 579–583. Thoughts: Background Color Influences the Perceived [14] J. Peirce, J.R. Gray, S. Simpson, M. MacAskill, R. Sentiment of Chat , Lecture Notes in Computer Höchenberger, H. Sogo, E. Kastman, J.K. Lindeløv, PsychoPy2: Experiments in behavior made easy, Behavior Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). 10514 Research Methods. 51 (2019) 195–203. LNCS (2017) 184–201. [15] I.H. Witten, E. Frank, M.A. Hall, C. Pal, Data Mining: [5] X. Xie, F. Song, Y. Liu, S. Wang, D. Yu, Study on the Practical Machine Learning Tools and Techniques, Fourth effects of display color mode and luminance contrast on Edi, Morgan Kaufmann Publishers Inc, 2016. visual fatigue, IEEE Access. 9 (2021) 35915–35923.

[6] R.H. Hall, P. Hanna, The impact of web page text- background colour combinations on readability, retention, aesthetics and behavioural intention, Behaviour and Information Technology. 23 (2004) 183–195.

8 2021 International Conference on Digital Contents: AICo(AI, IoT and Contents) Technology (DigiCon-21) Emotional Image Recommendation System Based on Biometric Information and Collaborative Filtering

Tae-Yeun Kim Young-Eun An National Program of Excellence in Software center IT Research Institute Chosun University Chosun University Gwangju, Korea Gwangju Korea [email protected] [email protected]

Abstract—In this paper, we propose an image the body from sense and perception that respond to the external recommendation system that considers the user's emotions in the physical stimulus obtained from the experience [3]. Emotion proposed system. We measure ECG and PPG, which are plays an important role in understanding the essence of human biological information of the user, and then use frequency because it influences on numerous parts of life such as learning analysis algorithm and SVM-GA algorithm implemented the ability, behavior, and judgment [4]. A biosignal holds a recommended system and tried to improve the reliability of weakness of non-specificity because it is highly sensitive to recommendation system by using collaborative filtering. We surrounding conditions. Still, biosignal is capable of not only implemented a user friendly interface to verify the emotion identifying the changes in emotion serially but also figuring out information of the images through the recommended values and how emotion influence on human physiologically [5]. measurement graphs of the emotional images through the mobile Therefore, biosignal is used as an important indicator for application, thereby enhancing the usability of the proposed system. Experimental results showed that subjects' biometric emotion-related researches and numerous technologies have information (ECG, PPG) were classified and learned by SVM- been applied and developed by using biosignal. GA algorithm and classified into 4 kinds of emotion information In response, this study tried to construct an image according to biometric information. The average accuracy of the recommendation system that utilizes biosignal-based emotion classified data was 89.2%. In addition, 86.7% of the users' information of personal characteristic in order to enhance the satisfaction was measured, suggesting that the proposed emotion user satisfaction. This study tried to make a system which based search result is comparable to the emotion felt by a person. measures and analyzes the biometric information (ECG, pulse Keywords— Collaborative Filtering; Emotion; Frequency wave) of the user to recommend images depending on four Analysis Algorithm; Recommendation System; SVM-GA Algorithm emotion information (normal, sadness, surprise, happiness). To enhance the reliability of the recommendation system, this I. INTRODUCTION study tried to come up with a system that recommends by perceiving the emotion of user for each image data to extract In order to reflect various requirements of users effectively, emotion preference and predicting the emotion preference of there is a need for a recommendation method which considers users for the images with higher emotion preference by users about the user preference. In response, there had been with similar emotion preference. In other words, this study numerous researches on different recommendation methods tried to create a system which utilizes emotion information and representative recommendation methods are content-based analyzed by biometric signal (ECG, pulse wave) of the user recommendation and collaborative filtering method. Content- and emotion preference predicted from the collaborative based recommendation method directly analyzes the contents, filtering to recommend the image data matched depending on analyzes the similarity between contents and contents or four emotions (normal, sadness, surprise, happiness) and verify between contents and user preference, and recommends new the reliability of the system [6]. contents based on the analysis results. On the other hand, collaborative filtering assumes the contents preference by II. SYSTEM CONFIGURATION AND DESIGN analyzing other users who show preference similar to the certain user [1][2]. To recommend the emotion image in consideration of the biometric information of the user, the system largely divides Especially, recommendation method needs to understand the emotion information into four cases of normal, sadness, and reflect the user characteristics and situations including surprise, happiness and it uses SVM-GA algorithm to personal preference and emotion in order to enhance the user normalize and learn the emotion information for the biometric satisfaction. However, most recommendation methods fail to information [7]. Also, emotion preference was predicted by enhance the user satisfaction because such characteristics are using the collaborative filtering. The system recommends the not taken into consideration. image that matches the emotion of the user by matching the Emotion is a crucial characteristic in the recommendation derived emotion information and emotion preference. method and it means a complex condition that happens inside

9 2021 International Conference on Digital Contents: AICo(AI, IoT and Contents) Technology (DigiCon-21)

The image recommendation system that utilizes emotion induction image to measure the reference and minimize the information and collaborative filtering is largely composed of individual differences. mobile device, emotion classification module, and collaborative filtering module. To select SVM algorithm as the optimum condition for the biometric data analyzed through frequency analysis algorithm, The mobile device of the system provides the interface of this study controlled r, c, d and selected a kernel function. the recommendation system matched based on the emotion r information and emotion preference of the user. The emotion In the variable of polynomial kernel function, was d classification module uses a biosensor to measure the biometric changed from 0 to 1 by 0.1 while was changed from 0 to 10 information (ECG, pulse wave) of the user, extracts and by 1. The maximum performance was 83.87% when r was set analyzes the characteristics of the measured data, and utilizes to 1 while the maximum performance was 85.42% when d was the information classified into the four standardized stages set to 1. For the variable of radial basis kernel function, the (normal, sadness, surprise, happiness) to extract emotion maximum performance was 86.13% in r = 1 by changing the information of the user. In addition, collaborative filtering variable r from 0 to 1 by 0.1. In the sigmoid kernel function, module extracts the emotion preference information of the user the accuracy ranged from 0.2 to 82.52% when r was changed by using the emotion preference of the users who felt similar from 0 to 1 while the accuracy ranged from 0 to 80.86% when emotion. By using the information extracted from emotion c was changed from -1 to 1 [8][9]. classification module and collaborative filtering module, the system recommends the images matched based on the bio- TABLE I. ACCURACY OF EACH KERNEL FUNCTION emotion information and emotion preference information of the Kernel Function Accuracy (%) user. Linear Function 82.97 Figure 1 is a diagram of bio-emotion information and Polynomial Function 85.42 emotion preference-based image recommendation system suggested in this study. Radial Basis Function 86.13 Sigmoid Function 82.52

Table 1 indicates the accuracy of each kernel function and radial basis kernel function demonstrates the best performance of 86.13%. In response, this study used radial basis function (r = 1) with the highest accuracy of 86.13% for the kernel function.

TABLE II. PERFORMANCE EVALUATION OF DATA

Fold No Accuracy (%) 1 88.1 2 85.7 3 87.3 4 87.1

5 90.5 Fig. 1. Configuration of system 6 92.3 III. SYSTEM IMPLEMENTATION AND PERFORMANCE EVLUATION 7 90.8 For the personalized emotion information, this study used a 8 89.1 biosensor, BIOPAC MP 150TM, to measure the biosignal of 9 89.5 10 subjects and the biosignal was used in five-minute interval. The first one minute was set to stabilization stage and it was 10 86.9 not used for analysis. In remaining four minutes, mean amplitude of the signal was calculated to detect 256 peaks (three minute~ four minute) and frequency was analyzed. Also, Also, this study conducted 10-Fold cross validation to in order to encourage the emotion of subjects smoothly, minimize the influence of learning data on experiment results subjects were made to take 30-minute break before emotion and to secure reliability. The experiment data was classified induction to keep the base line of the biosignal to constant level. into evaluation data and verification data and this study used After 30-minute break, the subjects participated in the them in 7:3 ratios. Also, the system was optimized under leave- experiment. In addition, the subjects were made to watch a one-out method. Table 2 indicates the accuracy of the documentary image for 15 minutes before given the emotion verification data (unit %) obtained from the experiment.

10 2021 International Conference on Digital Contents: AICo(AI, IoT and Contents) Technology (DigiCon-21)

This study made a mobile application with an image to enhance the reliability of the recommendation system. To recommendation system utilizing the bio-emotion information enhance the usability of the suggested system, this study made of user. The system uses emotion information data perceived a user-friendly interface to identify the emotion information of through the collaborative filtering to predict the user preference the image through the measurement value and measurement and recommend the image. Figure 2 is the image bio-emotion graph for the emotion image recommend by the mobile information based recommendation system suggested in this application study. In Figure 2, (a) is the recommended emotion image and measured value, (b) is the list of recommended emotion images, In the future, this study seeks to enhance the performance and (c) is the graph for emotion image measurement. of the suggested system by improving the performance of suggested algorithms and researching on various classification In the system suggested in this study, emotion classification algorithms. Also, there needs to be researches on a module standardizes the bio-emotion information and image recommendation system that utilizes not only emotion image emotion information into four stages for the matching based on but also various forms of emotion information and this study the emotion information of the user. Also, the system sues the seeks to make a system with more stabilized recognition rate. collaborative filtering module to analyze the selection pattern of user, search the preference image in emotion database, and ACKNOWLEDGMENT recommend in order. Here, the user is recommended of the This research was supported by Healthcare AI Convergence optimized image by expressing the emotion analysis diagram Research &Development Program through the National IT with analyzed images. Industry Promotion Agency of Korea (NIPA) funded by the Ministry of Science and ICT (No. S1601-20-1041) This research was supported by Basic Science Research Program through the National Research Foundation of Korea(NRF) funded by the Ministry of Education(No. NRF- 2017R1A6A1A03015496).

REFERENCES [1] B. H. Oh, J. H. Yang and H. J. Lee “A Hybrid Recommender System based on Collaborative Filtering with Selective Utilization of Content- based Predicted Ratings,” Journal of KIISE, vol. 41, no. 4, pp. 289-294, Apr. 2014. [2] H. S. Kim, J. H.. Lee and H. D. Lee “Development of personalized clothing recommendation service based on artificial intelligence,” Smart (a) (b) (c) Media Journal, vol. 10, no. 1, pp. 116-123, March 2021. Fig. 2. Emotional image recommendation system [3] S. Z. Lee, Y. H. Seong and H. J. Kim, “Modeling and Measuring User Sensitivity for Customized Service of Music Contents,” Journal of For the evaluation on the system recommendation results, Korean Society For Computer Game, vol. 26, no. 1, pp. 163-171, March 2013. as the emotion information is subjective, the results were [4] M. Li and B. L. Lu, “Emotion Classification Based on Gamma-band measured based on the user satisfaction toward the reliability EEG,” 31th Annual International Conference of the IEEE EMBS, pp. on the search results based on the emotion information. For the 1323-1326, USA, Sep, 2009. evaluation, 30 sample users were selected and given four [5] X. Niu, L. Chen and Q. Chan, “Research on Genetic Algorithm based on emotion key words. Then, their satisfaction toward the research Emotion recognition using physiological signals,” 2011 International results was evaluated. The mean value of satisfaction toward Conference on Computational Problem-Solving (ICCP), pp. 614-618, the emotion information of the searched images was 86.7%. 2011. This demonstrates that the suggested emotion information- [6] T. Y. Kim, H. Ko, S. H. Kim and H. D. Kim “Modeling of Recommendation System Based on Emotional Information and based search results are relatively similar to the emotion that Collaborative Filtering,” Sensors, vol. 21, no. 6, March 2021. people actually feel. [7] H. Yang and S. Wu “EEG classification for BCI based on CSP and SVM-GA,” In Applied mechanics and materials, vol. 459, pp. 228-231, Trans Tech Publications Ltd. Oct 2014. IV. CONCLUSIONS [8] P. Michel, and R. El Kaliouby “Facial expression recognition using support vector machines,” In The 10th International Conference on For the image recommendation system in consideration of Human-Computer Interaction, Crete, Greece, 2005. the user emotion, this study created a system which [9] V. Sathya and T. Chakravarthy "Emotional Based Facial Expression recommends the image that matches the user emotion by Recognition using Support Vector Machine," International Journal of measuring the biometric information including ECG and pulse Engineering and Technology, vol. 9, no. 6, pp. 4232-4237, Dec 2017. wave and utilizing the frequency analysis algorithm and SVM- GA algorithm. The system also uses the collaborative filtering

11 2021 International Conference on Digital Contents: AICo(AI, IoT and Contents) Technology (DigiCon-21) A combination of uniform label training strategy and AutoAugment policies to identify the category of foliar diseases in apple trees

Vo Hoang Trong*, Le Hoang Anh, Lee JuHwan, Kim JinYoung Department of ICT Convergence System Engineering Chonnam National University Gwangju, Republic of Korea *[email protected]

Abstract— Applying a conventional training method on an generalize feature characteristics, hence bias toward these imbalanced dataset makes the Convolutional Neural Network classes. (CNN) model biased to majority classes. In this paper, we propose a uniform label training strategy, in which the To reduce this kind of bias, we propose a uniform label distribution of labels in a single batch is uniform. This strategy training strategy, in which samples in minority classes have allows the model to learn samples from minority classes more time training than those in majority classes. We can do frequently. Furthermore, to increase the variety of samples in this by distributing labels in each batch to be uniform: All minority classes, we apply the AutoAugment policies collected classes have an equal number of samples. To solve the lack of from the ImageNet dataset as a data augmentation technique, in variety in minority classes, we apply the AutoAugment policies which samples from minority classes have more opportunities as the data augmentation techniques. These are the optimal using these techniques than those from majority classes. Finally, techniques collected by using reinforcement learning on the we experiment with the EfficientNet B0 on the Plant Pathology ImageNet dataset. The probability that a sample is being dataset from the Plant Pathology 2021 Challenge on Kaggle. By augmented is inversely proportional to the number of samples applying the Learning Rate (LR) Range test to find the optimal in the corresponding class to the whole training set. learning rate in each model, results on the Kaggle scoreboard show that our strategy gets 0.74761 on Efficientnet B0. We experiment with this strategy on the Plant Pathology 2021 Challenge on Kaggle using the Efficientnet B0 model. Keywords— uniform label training, AutoAugment, cyclical Using the LR range test to determine the optimal learning rates learning rates, deep neural network, imbalance classification, shows almost no difference between use or unuse data disease classification augmentation. Compare with the conventional training approach, results on Kaggle show that our strategy shows I. INTRODUCTION slightly better on the F1 score (0.74761 versus 0.69548). Apples are one of the most important temperate fruit crops in the world. However, Foliar (leaf) diseases pose a major II. DATASET threat to the overall productivity and quality of apple orchards. The Plant Pathology dataset [2] contains 18632 high- The current process for disease diagnosis in apple orchards is quality RGB color images belongs to 12 classes. Figure 1 based on manual scouting by humans, which is time- shows the distribution of classes. It shows that this dataset is consuming and expensive. imbalanced, in which the largest classes are healthy and scab Although computer vision-based models have shown have nearly 5000 images, while frog_eye_leaf_spot_complex, promise for plant disease identification, some limitations need to be addressed. Large variations in visual symptoms of a single disease across different apple cultivars, or new varieties that originated under cultivation, are major challenges for computer vision-based disease identification. These variations arise from differences in natural and image capturing environments, for example, leaf color and leaf morphology, the age of infected tissues, non-uniform image background, and different light illumination during imaging, etc. [1]. Besides, an imbalanced dataset also a major challenge in many plant disease identification problems. This kind of dataset can reduce performances of models in minority classes because high Figure 1. Distribution of classes in the Plant Pathology dataset. variety samples in majority classes help models easily Apple rust Apple scab Multiple healthy

12 2021 International Conference on Digital Contents: AICo(AI, IoT and Contents) Technology (DigiCon-21)

disease The LR range test in CLR is a suitable tool to find a suitable range of learning rates so that the model may converge. Figure 3 illustrates the LR range test. First, we define an upper and lower bound on our learning rate, where the lower bound should be very small (e.g. 10−10 ) and the upper bound should be very large (101 ). Then, we increase the learning rate exponentially after each minibatch update. In this figure, the suitable learning rate range is [10−5 , 10−2 ]. Figure 2. Images examples from the Plant Pathology dataset.

V. AUTOAUGMENT rust_frog_eye_leaf_spot, powdery_mildew_complex, and AutoAugment [6] is an automated approach to find data rust_complex are minority classes that only have around 250 augmentation policies from data. Authors of [6] apply images. reinforcement learning to find the best augmentation policy. Figure 2 shows examples from 4 classes. It shows that the Figure 4 shows AutoAugment framework. This framework critical features are shown on the leaf surface, such as rounding consists of many data augmentation policies S, such as image gray regions on Apple rust, green circles on Apple scab, and rotation, translation, auto contrast, etc. Reinforcement learning Multiple disease. Furthermore, Multiple disease may contain finds the best augmentation policy by applying a recurrent characteristics from the Apple rust and Apple scab. These neural network model to predicts an augmentation policyies features suggest we train the image with a large size so that the from the search space. Then, they apply these policies to a model may conveniently capture these features. child network and receive a reward R, which is the accuracy of this network. Finally, they use R with the policy gradient III. CNNS method to update the controller, generating better policies. Because of the small number of samples in this dataset, we Table 1 shows 25 data augmentation policies found on the use EfficientNet B0 to avoid overfitting during training. This ImageNet dataset. Given an image, we randomly select 1 out model [3] has 5,330,571 parameters and 29 MB in size. of 25 policies. Each policy has two operations, and each has 3 Authors argued that the deep CNN model could capture richer and more complex features and generalize well on new tasks, while wider networks tended to be able to capture more fine- grained features and were easier to train. In addition, with higher resolution input images, the model can potentially capture more fine-grained patterns. The authors used the Neural Architecture Search to scales network width, depth, and resolution with a set of fixed scaling coefficients.

IV. LR RANGE TEST Figure 4. AutoAument framework [6]. Cyclical Learning Rate (CLR) [4] is a technique to estimate a suitable range of learning rates so that the model Table 1. AutoAugment policies on the ImageNet dataset [6]. may converge quickly. By training the model with a small learning rate, the model spends a long-time converging, while a large learning rate may make the model explode.

Figure 3. Illustration of LR range test [5].

13 2021 International Conference on Digital Contents: AICo(AI, IoT and Contents) Technology (DigiCon-21)

equal to ratio of the image number in the largest class to perImgBatch, as shown in Line 3.

VII. EXPERIMENTS We trained the model using the Keras library in Tensorflow version 2, using the NVIDIA Tesla K40c graphic processing unit to accelerate the training speed. Since the ImageNet dataset contains leaf images, we applied transfer learning, using parameters trained on this dataset as the initial parameter. These parameters were optimized using stochastic gradient descent with Nesterov momentum. The size of a batch was 32, and images were resized to 256×256. We trained these models for 25 epochs in both conventional training and uniform label division. Because of small number of samples, we used an additional batch normalization layer to avoid the overfitting problem. Applying the LR range test to Efficientnet B0 in Figure 6 shows that regardless of using − − AutoAument, the suitable learning rate range was [10 3 ,10 1 ].

batchSize : Size of a batch. A : The training set has the number of elements equal to Figure 5. Application of AutoAugment policy. From left to Input right, up to down: Original image, sub-policy 0 to 24. the number of classes. Each element i contains all samples in class . arguments: Data augmentation method, probability of batchSize ← choosing this operation, and magnitude. We apply two 1 perImgBatch  A operations to augment images. Figure 5 shows examples of using 25 sub-policies on the leaf disease image. 2 rem←− batchSize perImgBatch ⋅A max(A ) By applying AutoAugment to reduce the biased to majority 3 numberOfBatches ←  classes, samples belong to minority classes have a higher perImgBatch chance of being augmented than those on majority classes. 4 lstData ← A Given a sample belong to a class i has the distribution of 5 An empty list newData = , the probability of this sample is being Pi[] Ai []/∑ j A [ j ] 6 For i from 0 to numberOfBatches : augmented was 1− Pi [].The smaller of a class, the higher 7 Create an empty list dtBatch chance of being augmented. 8 For c from 0 to A : 9 If lstData[c ]< perImgBatch : VI. UNIFORM LABEL DIVISION 10 lstData[]c← Ac [] Algorithm 1 shows the algorithm for uniform label 11 Shuffle lstData[c ] division. This algorithm requires two input arguments: the size 12 sampleSelect← lstData[c ][: perImgBatch] of a batch and the training set. To divide samples from all classes into a single batch uniformly, Line 1 calculates the 13 Add sampleSelect to dtBatch minimum number of samples per class (let me call this 14 Delete lstData[c ][: perImgBatch] number perImgBatch). It points out that our strategy only 15 works if the size larger than the number of classes. Since the 16 For r from 0 to rem : number of classes may not be a divisor of the size of a batch, 17 If lstData[r ]< 1 Line 2 calculates the remaining value after division (rem). For ← each class in Line 8, we start sample dividing by removing 18 lstData[]r Ar [] randomly selected samples in the class. If the number of 19 Shuffle lstData[r ] samples in a class is smaller than the size of a batch, we 20 sampleSelect← lstData[r ][:1] restore its removed samples in Line 9 and 10, and Line 17, 18. 21 Add sampleSelect to dtBatch At this point, this batch has rem empty positions left, so we 22 Delete lstData[c ][:1] fill these positions by doing the same process described above, but only 1 sample per class is randomly taken, in the order of 23 Shuffle dtBatch the smallest to the largest class. 24 Add dtBatch to newData Output Return newData This process finishes when all samples in the largest class Algorithm 1. Algorithm for uniform label division. are being divided, so the total number of batches in an epoch

14 2021 International Conference on Digital Contents: AICo(AI, IoT and Contents) Technology (DigiCon-21)

Our strategy had better performance than the conventional training method on the Plant Pathology 2020-FGVC7 challenge competition in Kaggle. In this competition [1], the evaluation metric is the F1 score. Our uniform label training strategy on the light Efficientnet B0 achieved a score of 0.7476, higher than the conventional method that has 0.6955 in score. However, our strategy shows poor performances on the local dataset due to a lack of dataset. Besides, some data (a) (b) augmentation techniques shown in Table 2 may diminish Figure 6. LR range test of Efficientnet B0 model. (a) critical features characteristic on the leaf surfaces, which was Conventional approach, not using data augmentation. (b). harmful to the model to generalize description for the dataset. uniform label training strategy with AutoAugment.

− VIII. CONCLUSIONS We selected 10 2 as the global learning rate because this value had a high reduction rate in loss value. This paper proposed an approach to train a CNN model on an imbalanced foliar disease in the apple trees dataset. To We divided 80% of the dataset as the training set and 20% avoid bias toward majority classes, we collected samples to a as the test set. During training, we selected parameters that batch to distribute classes in this batch are uniform. Hence, had the highest F1 score on the test set. Figure 7 and Figure 8 samples in minority classes have more time training than those are graphs of performances in the Efficientnet B0 in in majority classes. We applied AutoAugment policies conventional training and uniform label training strategy with collected from the ImageNet dataset as the data augmentation AutoAugment. Both training methods suffered from extreme technique to diversify samples in classes. The smaller of overfitting. However, the uniform label training strategy lets classes, the higher chance that samples in that class are being the training curve contact the F1 score curve and faster on augmented. We used Efficientnet B0 to train on this dataset. converge. Table 2 shows performance comparison between Applying the LR range test in CLR to find a suitable learning training the model conventionally and using the uniform label rate range shows almost no difference between conventional training strategy. and our strategy in a single model. Results show that the model suffered from overfitting problems, but the uniform label training strategy helped the model faster convergence. On the F1 score in the Kaggle competition, training Efficientnet B0 using uniform label training strategy got the highest score.

ACKNOWLEDGEMENT This work was carried out with the support of "Cooperative

(a) (b) Research Program for Agriculture Science and Technology Figure 7. Performances of Efficientnet B0 in conventional Development (Project No. PJ01385501)" Rural Development training method. Administration, Republic of Korea.

REFERENCES

[1] FineGrainedVisualCat, "Plant Pathology 2021 - FGVC8 - Problem Statement," Kaggle, 2021. [Online]. Available: https://www.kaggle.com/c/plant-pathology-2021-fgvc8/overview. [2] R. Thapa, K. Zhang, N. Snavely, S. Belongie and A. Khan, "The Plant Pathology Challenge 2020 data set to classify foliar disease of apples," Applications in Plant Sciences, vol. 8, no. 9, 2020. (a) (b) Figure 8. Performances of Efficientnet B0 in uniform label [3] M. T. a. Q. Le, "Efficientnet: Rethinking model scaling for convolutional neural networks," in International Conference on Machine Learning, training strategy with AutoAugment. 2019. [4] J. Jordan, "Setting the learning rate of your neural network," [Online]. Table 2. Results comparison. ‘c’ stands for results using Available: https://www.jeremyjordan.me/nn-learning-rate/. conventional training. ‘u’ stands for uniform label training [5] L. N. Smith, "Cyclical learning rates for training neural networks," in strategy. 2017 IEEE winter conference on applications of computer vision Model Test loss Test acc. Test F1 Kaggle (WACV), 2017. Eff (c) 0.5701 0.8732 0.5903 0.6955 [6] E. D. Cubuk, B. Zoph, D. Mane, V. Vasudevan and Q. V. Le, Eff (u) 1.0578 0.8614 0.5624 0.7476 "Autoaugment: Learning augmentation strategies from data," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019.

15 2021 International Conference on Digital Contents: AICo(AI, IoT and Contents) Technology (DigiCon-21) Development of virtual reality fire training contents based on standalone HMD

Eun-Jee Song Hwa-Soo Jin Dept. of Computer Science Dept. of Virtual/Augmented Reality, Graduate School Namseoul University Namseoul School Cheonan-city, Korea Cheonan-city, Korea [email protected] [email protected]

Abstract—Now, it is necessary to raise public safety awareness training like in real environment, In particular, based on a more than ever, and it is time to do disaster response training stand-alone HMD that can experience virtual reality without a frequently. As a solution, the training system utilizing the virtual computer or smart phone, more students can easily train at any reality, which is the major technology of the 4th industrial time. revolution, is attracting attention. The virtual training system industry is an ICT convergence industry that implements II. STANDALONE HMD computer simulations of virtual environments similar to defense, Recently, a ‘wireless standalone HMD’ that allows you to medical, and disaster scenes [1]. Existing fire safety education requires a lot of manpower and cost, and since the training is not experience VR alone without a computer or smartphone has practical, the students participating in the training have little been developed and is gaining popularity. For example, immersion. In this paper, we propose a virtual training system Facebook released 'Oculus Go' in May 2018. 'Oculus Go' can for effective fire disaster response training. In particular, we be used for up to 2 hours and 30 minutes on the 2560×1440 develop fire training virtual reality contents using Oculus Go, a resolution LCD display provided by the headset itself. The standalone HMD that can experience VR alone without a built-in battery gives you wire-free freedom[3]. smartphone or computer. In May 2019, ‘Oculus Quest’ was released, continuing the Keywords—Virtual Reality, Virtual Training System, Fire lineage of the Oculus Standalone HMD. ‘Oculus Quest’ as Disaster Drill, Standalone HMD, Oculus Go shown in Figure 1 can experience higher-spec content than ‘Oculus Go’. 360-degree spatial rotation is possible. In 2020, I. INTRODUCTION Facebook introduced 'Oculus Quest 2', a next-generation virtual reality (VR) headset with lower price and higher image In April of 2014, about 300 students who were excited quality and performance. With the development of a about school field trip died from Sewol ferry wreckage in standalone HMD, the VR market is growing rapidly. Korea, and the whole country was in sorrow. In December 2017, even before the sorrow become a blur, 29 people died because of the fire at Jecheon sports center in Korea. Compared to the neighboring country Japan, there are too many man made disasters in Korea due to insensitivity to safety. Now, it is necessary to raise public safety awareness more than ever, and it is time to do disaster response training frequently. As a solution, the training system utilizing the virtual reality, which is the major technology of the 4th industrial revolution, is attracting attention. The virtual training system industry is an ICT convergence industry that implements computer simulations of virtual environments similar to defense, medical, and disaster scenes[1]. The need for virtual training system education for infants, elementary, Oculus Go Oculus Quest junior high and high school students who are weak in response to disasters has become more and more important[2]. Fig. 1. Standalone HMD Existing fire safety education requires a lot of manpower and cost, and since the training is not practical, the students III. DESIGN AND IMPLEMENTATION BASED STANDALONE participating in the training have little immersion. In this HMD study, we propose a virtual training system for effective fire disaster response training. A. Design The proposed system not only saves time and space This Virtual reality fire disaster training simulation is compared to existing training, but also enables experiential developed by using 3D, and Oculus Go

16 2021 International Conference on Digital Contents: AICo(AI, IoT and Contents) Technology (DigiCon-21) including Oculus controller. Unity 3D is general-purpose Oculus Go Utilities, OVR and Plugins are imported to content development software that is used in various fields Asset folder in Unity. Here we can note that other than game. In addition, a project developed by using GearVrControllerTest is present in OVR, Scene folder. This Unity can be built on a number of platforms by changing the Scene is opened, and a folder named Prefab is created in the settings slightly. Android Studio provides the required Assets folder. OVRCameraRig and OculusGoControllerModel development environment to build and run applications for are dragged and dropped in the Scene. In this way android. For the Fire Safety Drill content design, objects such OculusGoControllerModel is imported. After importing, these as Light, Spark Particle System, Fire Particle System, objects are placed in required manner as per the storyline. Hammer, Fire Extinguisher and Gas Particle System, Box, Colliders and Rigid body physical properties are added to the Glass, Broken Glass, Alarm Button, Sprinkler and Water objects in the scene to avoid the free fall. In Remote control, Shower Particle System, Canvas to display the text and Arrow written a code for Moving Forward and Backward in the to show the direction of Player’s movement are required in a classroom using Trigger and Back button and attached it to the Classroom. Once the objects are imported, these are arranged Player Controller as given below. A virtual content to in required manner as per the storyline. simulate fire in a room and process to extinguish it is created. B. Implementation A flowchart is given below to describe the execution of actions to accomplish the Fire Drill Training.

Fig. 3. Simulation of Fire Scene

As shown Fig.3 a fire scene is simulated with Room, Canvas to Display Text, Arrow to show the direction of your movement, Fire Particle System, Hammer, Fire Extinguisher and Gas Particle System, Box, Glass, Broken Glass, Button, Sprinkler and Water Shower Particle System as the objects required for the content.

Fig. 4. Displaying Guideline to Player in order to accomplish Fire Drill

As shown in Fig. 4 a canvas with direction arrow is displayed to the Player guiding him to find the Alarm and Fire Extinguisher. By pressing Trigger button in Oculus Go controller, player moves forward.

Fig. 2. Fire Drill Training flowchart

All the required objects are downloaded from Asset Store of Unity3D and imported. Below is shown a procedure to Fig. 5. Displaying Guideline to Grab Hammer and Break the Glass import a scene to Unity from Oculus Go utilities. From the

17 2021 International Conference on Digital Contents: AICo(AI, IoT and Contents) Technology (DigiCon-21)

As shown in Fig.5 a guideline is displayed to the Player to or Vive, but in this paper, we proposed a fire disaster response grab the hammer and break the Glass to reach to the Alarm virtual reality content using a standalone HMD that allows you and Fire Extinguisher. to experience VR only with an HMD without a PC or smartphone. By using a standalone HMD, users can receive training more conveniently and save time and space compared to conventional fire drills. In addition, the effect of training is high because it is possible to perform experiential training at any time if necessary.

ACKNOWLEDGMENT This research was supported by the MSIT(Ministry of Science and ICT), Korea, under the ITRC(Information Technology Research Center) support program(IITP-2021- Fig. 6. Aligning the Fire Extinguisher to Fire 2018-0-01431) supervised by the IITP(Institute for Information & Communications Technology Planning & As shown in Fig.6, as the player moves towards fire and Evaluation). align the nozzle of Fire Extinguisher towards the Fire, Fire starts reducing. As the player moves closer to the Fire and keep the Fire Extinguisher nozzle aligned towards Fire, the REFERENCES fire still reduces and get extinguished in some time. In similar way as mentioned above, Fire is extinguished at all locations. [1] M.S. Park, "Disaster Management of Building and Distributed Simulation Platform", Journal of Architectural Institute of Korea Vol.57 IV. CONCLUSIONS No.3 ,pp.32-36,2013. [2] J.S. Han , “VR Tourism Content Using the HMD Device “, Journal of Disaster response training is necessary in order to avoid Korean Contents Society Vol.15,No.3, pp.40-47, 2015. casualties caused by disasters [4]. In this paper, among many [3] “ Wikipedia” https://en.wikipedia.org/wiki/Oculus_VR disasters, training contents in response to fire were dealt with. [4] H.S.Kim , The Use of 3D Virtual Reality Technique in the web-based Existing fire safety education requires manpower and cost, and Earth science education , The Korean Society for Educational the training is not effective because the training is not practical Technology Vol,17 No.3 85-106, 2001. [5]. [5] Park, C. S. Framework for integrating safety into construction methods education through interactive virtual reality , Journal of Professional In this study, a virtual training system for effective Issues in Engineering Education and Practice, 142(2), 2015. firefighting and disaster response training was proposed. Virtual training systems generally use HMDs such as Oculus

18 2021 International Conference on Digital Contents: AICo(AI, IoT and Contents) Technology (DigiCon-21) Fingerprint-based Cooperative Beam Selection for Cellular-Connected mmWave UAV communication

Sangmi Moon Hyeonsung Kim Intae Hwang Dept. of IT Artificial Intelligence, Dept. of Electronic Engineering, Dept. of Electronic Engineering and Korea Nazarene University Chonnam National University Dept. of ICT Convergence System Engineering Cheonan, Republic of Korea Gwangju, Republic of Korea Chonnam National University [email protected] [email protected] Gwangju, Republic of Korea [email protected]

Abstract— In this paper, we propose a fingerprint-based from the fingerprint database instead of an exhaustive search, cooperative beam selection scheme for cellular-connected and beam cooperation is performed to improve the SINR for millimeter-wave (mm-wave) unmanned aerial vehicle (UAV) aerial UEs. communication. The proposed scheme consists of offline fingerprint database construction and online beam cooperation. II. 3GPP SYSTEM MODEL System-level simulations are performed to assess the UAV effect We consider the downlink cellular-connected UAV based on the 3rd generation partnership project new radio communication, where BSs are deployed in a hexagonal layout mmWave and UAV channel models. Simulation results show that the cooperative beam selection scheme can reduce the beam to communicate with their associated UEs. Each terrestrial BS sweeping overhead and improve the signal-to-interference-plus- employs 푁BS antennas and 푁RF RF chains to serve 퐾 single- noise ratio and spectral efficiency. antenna UEs (including both terrestrial UEs and aerial UEs) [4], [5]. The received signal 푦푏푘 ∈ ℂ of the k th UE in the b th cell Keywords—beam cooperation; beam selection; fingerprint; is given by mmWave; UAV 푦푏푘 = 퐡푏푏푘퐰푏퐬푏 + ∑푖≠푏 퐡푖푏푘퐰푖퐬푖 + 휖푏푘, (1)

푁BS×퐾 where 퐰푏 ∈ ℂ is the digital precoding matrix, and 풔푏 ∈ I. INTRODUCTION ℂ퐾×1 is the vector of the data symbols of cell 푏 , i.e., the concatenation of each data stream vector of UE, such as 풔푏 = The 3rd generation partnership project (3GPP), which 푇 oversees the standards activities for cellular networks has [푠푏,1, … , 푠푏,퐾] , where 푠푏,푘 is the data symbol for UE 푘 of cell 푏. 2 1×푁BS recently concluded a study item [1] to explore the challenges and 휖푏푘~𝒞풩(0, 휎휖 ) is the thermal noise, and 퐡푖푏푘 ∈ ℂ opportunities for serving the UAVs as aerial user equipments denotes the channel vector between BS 푖 and UE 푘 in cell 푏. (UEs), known as cellular-connected UAVs, in coexistence with terrestrial users. The cellular-connected UAVs leverage the The channel model can be translated into the beam space or terrestrial base stations (BSs) in cellular networks for realizing domain [4], [5] via a beamforming matrix U, which is high-performance UAV–terrestrial communication [2]. In this an N × N unitary matrix. For a uniform linear array (ULA) with network, a UAV receives line-of-sight (LoS) transmissions from N elements, the b th beam-forming vector corresponding to such numerous BSs when its altitude increases. Thus, a given UAV a codebook is given as follows. can be subjected to considerable interference from numerous kb 1 j2 j 2 ( N  1) ground BSs that transmit toward other terrestrial users or UAVs. NNT UULA (b ) [1 e e ] (2) Consequently, downlink transmissions toward aerial devices N generally suffer from poor signal-to-interference-plus-noise where b = 0, 1, …, N-1 is the beam index. For a uniform planer ratios (SINRs) more often than their terrestrial counterparts. array (UPA), a corresponding beam-forming vector is obtained Millimeter-wave (mmWave) communication, which utilizes the using the Kronecker product of the 1D beam-forming vectors in wide available bandwidth above 28 GHz, is a promising each horizontal and vertical domain. The codebook for UPA is technology to achieve high-rate UAV communication [3]. given by In this paper, we propose a fingerprint-based cooperative UUUUPA(,)()()k l ULA k ULA l (3) beam selection scheme for cellular-connected mmWave UAV where , , and  denotes the communication. The proposed scheme constructs an offline kN0,1, v lN0,1,  ,  h fingerprint database for beam selection and performs online Kronecker product. Then, the beamspace channel model can be beam cooperation. In the offline phase, the best beam index from represented by serving cells and the interference beam indexes from neighboring cells are stored. In the online phase, the best beams ̃ ̃ ̃ 퐇̃ = [퐡1, 퐡2, … , 퐡퐾] = 퐔퐇 = [퐔퐡1, 퐔퐡2, … , 퐔퐡퐾], (4) and interference beams are determined using the information

19 2021 International Conference on Digital Contents: AICo(AI, IoT and Contents) Technology (DigiCon-21)

̃ where 퐡푘, 푘 = 1, 2, … , 퐾 is the beamspace channel vector of the the serving cell only needs to transmit the best beam to the UE 푘th UE. When the BS is equipped with 푁RF = 퐾 RF chains, a according to the beam ID corresponding the matched fingerprint 퐾×퐾 reduced-dimension matrix where 퐇̃푏 ∈ ℂ = 퐇̃(푚, ∶)푚∈풮 position. As the aerial UEs experience LoS propagation should be reconstructed, where 풮 is the index set of the selected conditions for more cells with higher probability, in comparison beams. We use a simple zero-forcing (ZF) precoder as a digital with terrestrial UEs, the aerial UEs receive interference from more cells in the downlink. Therefore, JT is applied to improve precoder. The reduced-dimension ZF precoding matrix 퐰̃푏 ∈ ℂ퐾×퐾 is expressed as the performance of an aerial UE. In this study, multiple cells belonging to the same site are coordinated, and data are jointly 퐻 −1 퐰̃푏 = 훼퐇̃푏(퐇̃푏 퐇̃푏) , (5) transmitted to the UEs. As intra-site JT-coordinated multi-point is already supported in 3GPP standardization, enhancements are where 훼 is a scaling factor. not required. With beam cooperation for aerial UE, the interference from the neighboring cells is reduced because a III. FINGERPRINT-BASED BEAM SELECTION AND COOPERATION neighboring cell is converted from the interference signal to the A. Fingerprint Database Construction desired signal. This advantage can be observed from the SINR The measurement is performed in each cell by assigning the expression. Assuming that the UEs have perfect CSI, the terrestrial and aerial UEs to each fingerprint spot, and the resulting instantaneous SINR with beam cooperation at aerial measurement content includes the fingerprint position, service UE 푘 in cell 푏 is obtained as an expectation over all symbols, cell ID and corresponding optimal beam ID. If the UE is an aerial and it is given by UE, the interference cell IDs and the corresponding strongest ∑ 10휉푐/10 푆퐼푁푅 = 푐∈𝒞 , (6) interference beam IDs also need to be measured. Then, the 푐표표푝 휉푐 휉푗/10 measurement result is saved in the corresponding fingerprint (∑푗∈ℬ 10 )−∑푐∈𝒞 1010+푁0 database, i.e., a set of all the fingerprint information within a cell where the is the received power of UE k from cell j, and B coverage area. Each fingerprint database is divided into two  j parts, namely the terrestrial and aerial fingerprint datasets. Table and C represent a set of BSs and JT BS, respectively. In contrast, 2 shows the fingerprint database of the cell b, in which the SINR without beam cooperation at terrestrial UE 푘 in cell 푏 is Pa ( a 1,2,..., A ) and Pt ( t A  1, A  2,..., A  T ) are the aerial and terrestrial fingerprint position, respectively. 10휉푏/10 푆퐼푁푅 = . (7) 푛표푛−푐표표푝 휉푗/10 휉 /10 I_ IDi ( i  1,2) is an interference cell ID for a fingerprint (∑푗∈ℬ 10 )−10 푏 +푁0 * * i position, bt is the optimal beam ID for Pt , and ba and ba are Thus, beam cooperation not only boosts the signal strength the optimal beam ID and the strongest interference beam ID, but also reduces the interference level. respectively, corresponding to the interference cell i for P . a IV. SIMULATION RESULTS Note that the number of interference cells is two (푖 = 1, 2) because we consider the intra-site joint transmission (JT) for The following schemes are simulated for comparison: aerial UEs, as will be discussed in the following subsections. (1) Random beam selection (R-BS): each UE is served with only its serving cell, and the beams are selected randomly from TABLE I. EXAPLE OF FINGERPRINT FOR CELL 푏 among all the beams. Aerial UE (2) Exhaustive search beam selection (ES-BS): each UE is 푃 푃 … 푃 served with only its serving cell, and the beams are selected via 1 2 퐴 an exhaustive search. ∗ ∗ ∗ 푏1 푏2 푏퐴 (3) Beam selection with single-cell transmission (SC-BS): each 퐼_퐼퐷 푏̃1 퐼_퐼퐷 푏̃1 퐼_퐼퐷 푏̃1 1 1 1 2 … 1 퐴 UE is served with only its serving cell, and fingerprint-based ̃2 ̃2 ̃2 퐼_퐼퐷2 푏1 퐼_퐼퐷2 푏2 퐼_퐼퐷2 푏퐴 beam selection is used. Terrestrial UE (4) Beam selection with JT (JT-BS): the same as scheme 3, but 푃퐴+1 푃퐴+2 … 푃퐴+푇 intra-site JT is used for aerial UEs. Figure 1 plots the CDF of the SINR for schemes 1 to 4, where ∗ ∗ ∗ the aerial UE ratio is 50% (5 aerial UEs per cell). The proposed 푏퐴+1 푏퐴+2 … 푏퐴+푇 beam selection (JT-BS) can achieve a higher SINR than the other schemes. This is because the percentage of aerial UEs experiencing cell-edge-like radio conditions (i.e., poor downlink B. Beam Cooperation SINR) is much higher than that of terrestrial UEs. For the schemes (3) and (4), the performance is greatly affected by the the BS can further determine to which fingerprint dataset of accuracy of the fingerprint position measurement of the UE. the service cell of the UE the matched fingerprint position When the error between the fingerprint position and the actual belongs. If the matched fingerprint position is in the aerial position of the UE is large, the matching result of the BS will be fingerprint position, the UE can be identified as an aerial UE; incorrect, and the selected beam is not the best beam for the otherwise, it is a terrestrial UE. When a UE is a terrestrial UE, current UE. Although the fingerprint-based scheme shows a

20 2021 International Conference on Digital Contents: AICo(AI, IoT and Contents) Technology (DigiCon-21) slightly degraded performance, it can avoid the computational complexity caused by the exhaustive search. In the fingerprint- based beam selection schemes, the UE only needs to feed back the simple current position, and the BS matches the current position of the UE with the fingerprint database to obtain the optimal/interference beam. Figure 2 plots the CDF of the SINR for schemes 3 and 4 in cellular-connected UAV communication when the aerial UE ratios are 7.1% and 50%. With the increase in the number of aerial UEs, the inter-cell beam interference increases, and the value of SINR in scheme 4 is significantly improved compared with that in scheme 3. This is because the beam cooperation boosts the signal strength and reduces the interference level in scheme 4, which is an advantage of interference mitigation. As the aerial UEs experience LoS propagation conditions for more cells with a higher probability, in comparison with terrestrial UEs, aerial UEs receive interference from more cells in the Fig. 2. CDF of the SINR for scheme 3 and 4. In this figure, we consider the downlink. When the aerial UE ratio is 7.1%, the performances 7.1% and 50% aerial UE ratio (1 and 5 aerial UE per cell, respectively). of schemes 3 and 4 tend to be consistent. As a smaller number of aerial UEs leads to fewer transmission beams in each cell, and the possibility of inter-cell beam interference is very small, the performances of the two schemes are consistent. It can be concluded that the fingerprint-based cooperation beam selection scheme can effectively avoid inter-cell interference in cellular- connected UAVs with densely deployed aerial UEs. With an increase in the number of aerial UEs, the advantage of beam cooperation is more evident. Figure 3 plots the CDF of the spectral efficiency for schemes 3 and 4 in cellular-connected UAV communication when the aerial UE ratios are 7.1% and 50%. Each value of SINR is mapped to the rate achievable on a given physical resource block by assuming ideal link adaptation, i.e., choosing the optimum modulation and coding scheme that yields a desired block error rate (BLER) [6]. We set the BLER to 10-1, which we regard as a sufficiently low value considering that retransmissions further reduce the number of errors. Note that the spectral efficiency Fig. 3. CDF of the spectral efficiency for scheme 3 and 4. In this figure, we does not exceed 7.44 bps/Hz, which is the maximum spectral consider the 7.1% and 50% aerial UE ratio (1 and 5 aerial UE per cell, efficiency in 5G. respectively). V. CONCLUSION In this paper, we proposed a fingerprint-based beam selection and cooperation scheme for mmWave UAV communication. The proposed scheme constructs an offline fingerprint database for beam selection and performs online beam cooperation. The system-level simulations are performed to assess the UAV effect based on the 3GPP new radio mmWave and UAV channel models. The simulation results showed that the proposed beam selection scheme could reduce the beam sweeping overhead and inter-cell interference.

ACKNOWLEDGMENT This research was supported by the Ministry of Science and ICT (MSIT), Korea, under the Information Technology Research Center (ITRC) support program (IITP-2020-2016-0- 00314) supervised by the Institute for Information & Fig. 1. CDF of the SINR for scheme (1)-(4). In this figure, we consider the 50% communications Technology Planning & Evaluation (IITP). aerial UE ratio (5 aerial UE per cell). This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT:

21 2021 International Conference on Digital Contents: AICo(AI, IoT and Contents) Technology (DigiCon-21)

Ministry of Science and ICT) (2020R1I1A1A01073948 and [3] R. Amorim, H. Nguyen, P. Mogensen, I. Z. Kovács, J. Wigard and T. B. 2021R1A2C1005058). This research was also supported by the Sørensen, “Radio channel modeling for UAV communication over cellular networks,” IEEE Wireless Commun. Lett., vol. 6, no. 4, pp. 514– BK21 FOUR Program(Fostering Outstanding Universities for 517, Aug. 2017. Research, 5199991714138) funded by the Ministry of [4] A. Sayeed and J. Brady, “Beamspace MIMO for high-dimensional Education(MOE, Korea) and National Research Foundation of multiuser communication at millimeter-wave frequencies,” in Proc. of Korea(NRF). IEEE Global Commun. Conf. Workshops (GC Wkshps), Dec. 2013. [5] P. V. Amadori and C. Masouros, “Low RF-complexity millimeter-wave REFERENCES beamspace-MIMO systems by beam selection,” IEEE Trans. Commun., vol. 63, no. 6, pp. 2212-2223, June 2015. [1] 3GPP, “3rd Generation Partnership Project; Enhanced LTE support for [6] 3GPP, “3rd Generation Partnership Project; Evolved Universal Terrestrial aerial vehicles,” TR 36.777, V15.0.0, Jan. 2018. Radio Access (E-UTRA); Physical Layer Procedures,” TS 36.213, V16.3.0, Oct. 2020. [2] Y. Zeng, J. Lyu, and R. Zhang, “Cellular-connected UAVs: Potentials, challenges and promising technologies,” IEEE Wireless Commun., vol. 26, no. 1, pp. 120-127, Feb. 2019.

22 2021 International Conference on Digital Contents: AICo(AI, IoT and Contents) Technology (DigiCon-21) Intelligent Infection Prevention Monitoring System Through CCTV Video Analysis around a Confirmed Patient Distance

Dongsu Lee and Sang-Joon Lee ASHIQUZZAMAN A K M Interdisciplinary Program of Digital Future Convergence Corporate Research Institute Service XISOM INC. Chonnam National Univerisity Daejeon, Korea Gwangju, Korea [email protected] [email protected], [email protected]

Seungmin Oh, Jihoon Lee, Yeonggwang Kim, Sangwon Oh and Jinsul Kim Department of ICT Convergence System Engineering Chonnam National Univerisity Gwangju, Korea {osm5252kr, ghost6268, yklovejesus, osw0788}@gmail.com, [email protected]

Abstract— In this research, we use the K-means clustering order of risk, group infection rates, and group infection rates technique and a deep learning-based crowd coefficient to group are sent to the manager monitor to monitor the existing contact and group-specific infection rates on a confirmed person basis tracing in this study. The study looked at the impact of using using CCTV images near new confirmed traffic lines in an this to help identify each possible patient who may or may not intelligent anti-epidemic monitoring system. We show that PSNR have been approached by a federal agency or local government. is 21.51 and the final MAE for the entire dataset is 67.98 after In recent years, there has been a lot of research towards 300 cycles of learning for all input images. preventing the spread of infectious diseases. [2] proposed a system for sustainable farming management and rapid response Keywords—Infectious Disease; Crowd Counting; K-means to the spread of infectious diseases through the development of Clustering; Monitoring System; Diseases Prevention; technologies to recognize abnormalities in poultry, which applied and verified technologies to minimize damage to farms I. INTRODUCTION and related industries. In [3,] we present a Random Forest- With the current global spread of coronavirus infections-19 based strategy for predicting several regions as a single model. (hereafter referred to as Corona 19), Korea has continued to be Furthermore, [4] mandates the use of digital forensics to concerned about public health and national security. These examine infectious disease contactors, iterating the process of infectious diseases are defined by the evolution of altered identifying diseased individuals' whereabouts and contact viruses throughout time, and the speed of transmission of persons, as well as defining risk categories. This is presented as infectious diseases has increased dramatically as a result of a single list that is sorted by importance. [5] found domestic- global transportation expansion. As of midnight on May 28, the based social contact patterns to properly anticipate or manage Central Disease Control Headquarters reported a total of the development or management of new infectious diseases or 128,761 confirmed cases out of 138,898 persons in Korea, re-pandemic infectious illnesses in Korea, and advised the 8,191 under quarantine, and 1,946 deaths. In China, it was government to implement infectious disease prevention revealed that 40% of patients had no first fever, implying that measures in Korea. silent Covid-19 infections could still spread the disease if they As a methodology for identifying the number of individuals were not aware of their illness (Table 1). present in images, crowd aggregation is one of the most important areas of video-based surveillance applications. This TABLE I. CASES OF COVID-19 IN KOREA[1] methodology can yield important information, such as the quantity or dispersion of crowds, from small to big groups. In The Number of Confirmed Cases recent years, there has been a steady increase in interest in Release from Quarantine Qurantined Deceased crowd counting technology. The varying sizes, distributions, 138,898 128,761 1,946 and appearances of the persons represented in the crowd [6] are among the numerous challenges that must be addressed. Pedestrian detection-based methods, crowd number regression Deep learning-based crowd counting and K-means approaches, and deep learning-based crowd aggregation clustering algorithms indicate that each neighbor is grouped in approaches have all been used in crowd aggregation research.

23 2021 International Conference on Digital Contents: AICo(AI, IoT and Contents) Technology (DigiCon-21)

Deep learning-based crowd aggregation approaches, which indirectly estimate density maps generated by producing density maps rather than directly measuring the size of crowds [7], are among the most actively investigated methodology recently. The K-means clustering technique divides a set of data into k different categories. It is pretty basic and works well for rapid and huge data. We iterate the computation till we reach the desired target to minimize the distance difference between each categorization. However, the number of clusters (k) must be provided, and the resulting results' utilization and performance deviation are substantial [8]. [9] presents a K-means clustering algorithm that guarantees privacy for encrypted data by Fig. 2. Example of a figure caption. (figure caption) measuring the distance between two vectors and finding the minimal value of the distance using Manhattan distance. Furthermore, [10] developed a strategy for employing the K- means clustering approach to extract information from The distance between confirmed people is calculated using enormous volumes of sports spectator big data in order to use deep learning-based crowd counting modules, video-measuring sports stadium visitor marketing. modules, and K-means clustering-based infection rate estimation modules, with 70% accuracy if the distance between The distance between confirmed people is calculated using confirmed people is less than 6 feet, 30% accuracy if it is less deep learning-based crowd counting modules, video-measuring than 6-10 feet, and 10% accuracy if it is more than 12 feet. We modules, and K-means clustering-based infection rate describe the infection rate according to the matching group of estimation modules. It calculates 70% if the distance between users, assuming k (number of groups) is 3. (Figure 3). confirmed people is less than 6 feet; 30% if the distance is less than 6 feet; and 10% if the distance is more than 12 feet. We describe the infection rate according to the matching group of users, assuming k (number of groups) is 3.

II. PROPOSED METHOD AND RESULT

A. Learning Method Design An intelligent anti-epidemic monitoring server stores and stores CCTV lists (videos) taken of confirmed movements from administrator monitors (e.g., local government situation rooms) and calculates how many people and how many feet each CCTV list is. We transfer the group and infection rate to the manager monitor for the final predicted potential patient movement surrounding personnel combined information (Figure 1).

Fig. 3. Each person around a Confirmed Person with K-means Clustering

B. Learning Result and Discussion

We use the architecture of VGGNet to create a novel crowd Fig. 1. Proposed Implementation Scenario Workflow Chart counting neural network in this paper. We extract multi-scale features to accommodate the scale fluctuation of crowds in The administrator monitor displays groups classified as photos, and the suggested novel crowd counting neural potential patients and calculated infection rates through network enters 64 nodes 2D convolutional layers, After that, it distance and infection rates from the intelligent anti- goes through a single maxpooling to get 128 nodes 2D proliferation monitoring server (Figure 2). convolutional layers and 256 3 layer convolutional layers. The output is a single node 2D convolution layer with ReLU activation functions on all layers. Using geometry-adaptive Gaussian kernels, crowd density maps yield ground-truth

24 2021 International Conference on Digital Contents: AICo(AI, IoT and Contents) Technology (DigiCon-21) density maps. We used the Mean Absolute Error (MAE) and group infection rates for each person's contact in a risk-order the Maximum Signal-to-Noise Ratio (PSNR) as performance basis on the manager monitor. We propose an intelligent anti- metrics, and The Shanghaitech dataset, which contains 1,198 proliferation monitoring system based on CCTV image photos with a total of 330,000 annotations, was used as analysis for the proven motorway in this paper. To that purpose, learning data. It was also divided into two parts: part A, which a deep learning-based crowd aggregation of confirmed persons had 482 photographs, and part B, which had 716 photographs and their neighbors was used to depict the groups and group- taken in Shanghai, China. by-group infection rates for each person's contact in a risk- order basis on the manager monitor.

ACKNOWLEDGMENT This research was supported by the MSIT(Ministry of Science and ICT), Korea, under the ITRC(Information Technology Research Center) support program(IITP-2021- 2016-0-00314) supervised by the IITP(Institute for Information & Communications Technology Planning & Evaluation).

REFERENCES [1] The Central Disease Control Headquaters, “Cases of COVID-19 in Korea”, http://ncov.mohw.go.kr/en/bdBoardList.do?brdId=16&brdGubun=161& dataGubun=&ncvContSeq=&contSeq=&board_id= (accessed May 28, 2021) Fig. 4. Predicted MAE of the neural network [2] H.W. Kim and C.W. Park, "A Study on the Development of Abnormal Information Collection and Monitoring System Technology for Early Detection of Avian Infectious Diseases," Proceedings of Symposium of the Korean Institute of communications and Information Sciences, 2019, To determine the relative distance depending on the datum, pp. 661-662. we use the Gaussian Density Maps suggested in the movie. We [3] J. Moon, S. Jung, H. Kim and E. Hwang, "Daily Occurrence Prediction next build clusters based on density based on distance from of Regional Infectious Diseases Using Random Forest", Proceeding of neural network prediction by feeding the output matrix to two The Korean Institute of Information Scientists and Engineers, 2019, pp. 335-337. center-point-based K-means clustering algorithms. The PSNR [4] I.H. Yoon, "Application of Digital Forensics Methodologies in for all input learning images was 21.51 after 300 iterations of Epidemiological Contact Tracing", Proceeding The Korean Institute of learning, and the final MAE for the entire dataset was 67.98. Information Scientists and Engineers, 2019, pp.1639-1641. (Figure 4). This shows that the picture quality loss information [5] H.S. Oh, "Systemic Review of Social Contacts of Person to Person and observations have a low error for calculating infection rates Spread of Infections", Journal of Korea Academy Industrial Cooperation and risk net groups, as well as predicting infection rates among Society 21(2), 2020, pp. 85-93. confirmed people and their neighbors. [6] K. Shim, J and Byun, C. Kim, "Multi-step Quantization Training for Crowd Counting", Proceeding of The Institute of Electronics and Information Engineers, 2019, pp. 983-985. III. CONCLUSION [7] J. Lee and S. Yoon, "Trend Analysis Crowd Counting based on Deep We propose an intelligent anti-proliferation monitoring Learning", Proceeding of The Institute of Electronics and Information system based on CCTV image analysis for the proven Engineers, 2019, pp. 735-737. motorway in this paper. To that purpose, a deep learning-based [8] H. Choi, Q.P. Peng and W. Rhee, "Design and Implementation of the Machine Learning-based Restaurant Recommendation System", Journal crowd aggregation of confirmed persons and their neighbors of Digital Contents Society 21(2), 2020, pp. 259-268. was used to depict the groups and group-by-group infection [9] Y. Jeong, J.S. Kim and D.H. Lee, "Privacy-Preserving k-means rates for each person's contact in a risk-order basis on the Clustering of Encrypted Data", Journal of the Korea Institute of manager monitor. We propose an intelligent anti-proliferation Information Security & Cryptology 28(6), 2018, pp. 1401-1414. monitoring system based on CCTV image analysis for the [10] S. Park, S. Ihm and Y. Park, "A Study on Application of Machine proven motorway in this paper. To that purpose, a deep Learning Algorithms to Visitor Marketing in Sports Stadium", Journal of learning-based crowd aggregation of confirmed persons and Digital Contents Society 19(1), 2018, pp. 27-33. their neighbors was used to depict the groups and group-by-

25 2021 International Conference on Digital Contents: AICo(AI, IoT and Contents) Technology (DigiCon-21) A Study on the Improvement of Resource Utilization Using Deep Learning Applications in Virtualized Platforms

Seungmin Oh Yeonggwang Kim Department of ICT Convergence System Engineering Department of ICT Convergence System Engineering Chonnam National University Chonnam National University Gwangju, Korea Gwangju, Korea [email protected] [email protected]

Sangwon Oh Jinsul Kim Department of ICT Convergence System Engineering Department of ICT Convergence System Engineering Chonnam National University Chonnam National University Gwangju, Korea Gwangju, Korea [email protected] [email protected]

Abstract—Recently, a lightweight virtualization technology capacity for services in traditional services that used to provide based on Linux OS, known as Docker, has been used a lot in deep only one service on one physical server, reducing maintenance learning application services. Because deep learning applications costs and efficient use of resources. The advantages of require high-performance computing resources, research using virtualization techniques using these containers are heavily computing resources efficiently is essential to providing services used today in application to deep learning technologies that are through virtualization platforms. This paper monitors CPU and essential to computing resources. Deep learning models are GPU resources using cAdvisor and nvidia-smi when operating techniques that can learn data and label information using deep learning application fire/smoke detection model and layered neural networks (NNs) to characterize and predict data fashion-MNIST classification model in the docker container. We at a level similar to or better than human intelligence. While also studied ways to improve resource utilization by comparing sufficient resources are essential for the efficient learning and GPU resource usage with the time required for learning due to the optimization of deep learning applications. Through this, operation of container-based deep learning models, there is a when operating a single deep learning application within a lack of systematic analysis in operating deep learning models virtualization platform, it is used in parallel with multiple GPUs in container environments. Therefore, we configure three on average 88% more efficiently than a single GPU, and 19% environments (CPU, GPU, parallel GPU) to operate deep more efficiently when operating multiple deep learning learning applications on a virtualized platform composed of applications. Comparing the learning time performance of deep container environments, learn how to improve resource learning applications, a single GPU produced more efficient utilization by monitoring the usage of CPU, RAM, and GPU in results when operating multiple deep learning applications. containers, and compare performance. The composition of this paper is as follows. We introduce the required background Keywords—Virtualized Platform, Docker, Container Resource, techniques and related studies in Section 2. Section 3 presents Resource Utilization Improvement, Deep Learning Application measures to improve resource utilization using smart factory applications in virtualization platforms, Section 4 compares I. INTRODUCTION experimental results to derive performance. Finally, Section 5 is conclusion. Virtualized platform technology has emerged to use limited physical servers more efficiently, and various studies have been conducted. Virtualization technology is a broad term for II. BACKGROUND AND RELATED WORKS abstracting computer resources, which hides resources from other systems, applications, users, etc. that exist physically. A. Docker container Container technology is a technology that creates interprocess Docker is an open-source virtualization platform that barriers in the host operating system (OS) via Linux Container automates the placement of Linux applications in software (LXC), which allows applications to run in their own isolated containers [1, 2], which packages software into standardized regions by combining namespace and cgroups. Operating units called containers as a software platform that can quickly applications using virtualization technology has the advantage build, test, and deploy applications. Therefore, it has the of operating one or more services using computing resource advantage of having better performance and no guest OS than

26 2021 International Conference on Digital Contents: AICo(AI, IoT and Contents) Technology (DigiCon-21) traditional virtualization technologies. Docker uses the concept nvidia-smi to build experimental environments for each of Image, which includes files and settings required to run the container and monitor resource utilization. container. Even if the container is deleted, the image remains unchanged, allowing the desired application environment to be configured when necessary. Dockers can deliver code faster, standardize application operations, and improve resource utilization by facilitating migration, reducing application operating costs compared to traditional methods. Due to these advantages, docker containers also performed better in big data distributed processing environments based on docker containers than in hypervisor Xen environments for operating Fig. 1. Deep Learning Application Model Used to Monitor Resource traditional virtual machines [3]. Recently, orchestration Utilization technologies such as docker swarm and have emerged as automated deployment and configuration of these containers, and research on efficient storage resource management has also emerged [4].

B. Deep learning Deep learning has recently been used in many areas, and research has been conducted in various areas such as computer vision, voice recognition, video recognition, and natural language processing. Deep learning has shown good performance, especially in research such as image classification/location recognition/object detection, in the field of computer vision. It is a CNN-based deep learning algorithm [5, 6] that classifies images of cancer cells or skin diseases and Fig. 2. Configure a Virtualized Platform Resource Utilization Monitoring is widely used in the medical field. Deep learning algorithm Environment with cAdvisor and Nvidia-smi models consist of multiple layers of nodes and numerous neural network parameters, each of which must be learned at A. Optimization of Deep Learning Applications to Improve an appropriate value for a deep learning model to perform well. Virtualization Platform Resource Utilization Deep learning is a technology that allows computers to In this work, we monitor resource utilization when learning recognize, infer, and judge through many learning, but because deep learning application models in the same container it requires high-performance computer resources, learning and environment to learn how to improve resource utilization service operations using high-performance computers, clouds, following optimization of deep learning application models and virtualization platforms based on GPUs or TPUs have within virtualization platforms. Deep learning application recently attracted attention [7, 8]. models were trained and operated in three ways (CPU, GPU, and parallel GPU), as shown in Figure 2. The first environment III. PROPOSED METHODS did not utilize GPU resources, but only CPU resources. Figure In this chapter, experiments were conducted by 3.a is a monitoring screen of the virtualization platform's constructing a deep learning model to improve resource resources when using only CPU resources. Learning or utilization. The experimental environment in which the operations require high-performance computing resources, but virtualization platform is configured was carried out in because they do not utilize GPU resources, we find that many containers created by Ubuntu 16.04 images in Intel CPUs i7- resources are wasted because all core resources on the CPU are 9700K, 64GB RAM, 2080 NVIDIA GeForce RTX, Ubuntu overused to 0.9 or higher and GPU resources are not utilized. 16.04 LTS, and Docker 19.03.13 environments. A single deep The second environment constructed a deep learning learning application model used a deep neural network to application environment using CPU and GPU resources detect fire and smoke as shown in Figure 1, and the dataset together, and Figure 3.b illustrated a monitoring screen for consisted of image data for fire situations, smoke situations, resource utilization in that environment. Because high- and general situations, with 7,628 images for fire situations, performance computing utilizes GPU resources, each CPU core 9,999 images for smoke situations, and 21,951 images for was not overused by 0.9 or higher, unlike when only CPU general situations. Each image is entered in the form of [224, resources were utilized. We also found that each core is 224, 3], and each layer proceeds with batch normalization and appropriately utilized for scheduling, but wasted the remaining ELU function, MaxPooling, following the convolutional layer. GPU resources using only the resources of a single GPU The model consists of a four-layer convolution layer and a among GPUs. The third environment was proceeded using a three-layer fully connected layer. Neuron parameters to be model that learns via parallel data processing to effectively use learned have approximately 3.36 million parameters. Deep GPU resources. Figure 3.c shows the monitoring screen during learning applications used version 10.2 of CUDA for GPU use. parallel processing training of the aforementioned deep Figure 2 illustrates the systems that configured cAdvisor and learning applications. CPU resources are allocated to appropriate cores by scheduling, such as when a single GPU is used, to handle tasks. Furthermore, unlike when a single GPU

27 2021 International Conference on Digital Contents: AICo(AI, IoT and Contents) Technology (DigiCon-21) was used, all GPU resources could be appropriately used to IV. EXPERIMENT monitor the proper utilization of resources without wasting This chapter describes the results for the experiment with them. the proposed method. Figure 5 shows a graph of the time and GPU usage required for each learning when operating a deep learning application. In the case of using only a single GPU resource, regardless of the number of deep learning applications, GPU resource usage averaged 63–4% and up to 98% were found to make little difference. However, if it supports multiple GPU parallelism, it can be utilized on average 88% when operating a single deep learning application, up to 53% when operating multiple deep learning applications, on average 18% and up to 25% more. Therefore, in terms of improving utilization of GPU resources, it is important to enable parallel processing of multiple GPUs and models. However, it is also important to reduce learning time as well as resource utilization when operating deep learning applications within virtualization platforms. Utilizing GPU resources in Fig. 3. Monitoring Deep Learning Application Resources fire/smoke detection deep learning applications was able to (a) in CPU resource-oriented Virtualized Platform, (b) in CPU and GPU resource-oriented Virtualized Platform, save time by about 100 seconds, 269% faster, from 183 (c) in CPU and Parallel GPU resource-oriented Virtualized Platform. seconds to 86 seconds, and Fashion-MNIST classification models by about 2.5 seconds and 129% faster. When operating multiple deep learning applications, the number of CPU B. Improvement of Resource Utilization by Multiple Deep resources increased by 147% and 218% from 183 seconds and Learning Applications in Virtualized Platform 11 seconds to 270 seconds and 24 seconds. While using a Although we can see that parallel use of multiple GPU single GPU resource and comparing multiple GPU parallelism resources within virtualization platforms has the highest could be trained with approximately 19 seconds, 22% less time resource utilization, resource monitoring comparisons in terms than running a single deep learning application, and with of utilization of deep learning applications have been lacking. approximately 1.5 seconds and 18% less time for Fashion- In this chapter, we compare the performance of operating MNIST classification models, parallel GPU processing took multiple deep learning applications within virtualization approximately 0.6 seconds, 106% more time. Equations platforms using a model that classifies clothing images learned from Fashion-MNIST datasets [9] in addition to existing deep learning applications in existing situations. However, CPU resource utilization was similar to previous experiments, and only GPU utilization was monitored. Figure 4 shows a screen that utilizes GPU resources to monitor the utilization of GPU resources when learning conventional fire/smoke detection deep learning application models and Fashion-MNIST classification models. When using a single GPU, Figure 4.a shows that not only existing deep learning applications but also resources for the Fashion-MNIST classification model are increasingly burdensome for a single GPU, and a large portion of memory is wasted. Multiple GPU Figure 4.b shows that compared to a single GPU, GPU memory resources and usage are also efficiently distributed and performed. Fig. 5. GPU Resource and Execution Time Comparison for Multiple Applications on Virtualized Platform.

V. CONCLUSION This study discussed ways to improve resource utilization using deep learning applications in virtualization platforms utilizing Docker's containers. This paper compares performance through resource monitoring and GPU resource utilization, and learning time comparison of deep learning applications, according to the proposed experiments. Through experiments, it is most efficient to use multiple GPU parallelism to provide deep learning application services through virtualization platforms, but in terms of learning time Fig. 4. Configure a Virtualized Platform Resource Utilization Monitoring for deep learning applications, a single GPU showed the Environment with cAdvisor and Nvidia-smi highest performance. Deep learning application services using

28 2021 International Conference on Digital Contents: AICo(AI, IoT and Contents) Technology (DigiCon-21) virtualization platforms are expected to make the most of [3] Hae-Jin Chung, Yun-Mook Nah, “Effects of Hypervisor on Distributed computing resources on edge computing or terminal devices Big Data Processing in Virtualizated Cluster Environment.” KIISE close to the user, and many services will be made using Transactions on Computing Practices, Vol. 22, No. 2, pp. 89-4, 2016. [4] Sang-Hyeon Eum, Jik-Soo Kim, Method for Efficient Storage Resource artificial intelligence models. Management in Docker Container-baseed Cloud Environments, The Korean Institute of Information Scientists and Engineers, pp. 1247-1249, ACKNOWLEDGMENT July. 2020. [5] Keo-Sik Kim, Moon-Seob Lee, Dong-Hoon Son, Jeong-Eun Kim, Gi- This research was supported by the MSIT(Ministry of Hyeon Min, Kye-Eun Kim, Hyun-Seo Kang, “Performance of CNN- Science and ICT), Korea, under the ITRC(Information based Dermoscopic Image Classifier by using Data Balancing Technology Research Center) support program(IITP-2020- Algorithms,” Journal of Institute of Electronics and Information 2016-0-00314) supervised by the IITP(Institute for Information Engineers, Vol. 57, No. 7, pp. 76-82, July, 2020. & communications Technology Planning & Evaluation) and [6] Shin Kim, Kyoung-Gro Yoon, Cancer Histopathological Image Classification based on Convolutional Neural Network, pp. 46-48, Nov. also are results of a study on the supported by Electronics and 2018 Telecommunications Research Institute(ETRI) grant funded by [7] K. Ye, Y. Kou, C.Lu, Y. Wang, C. Xu, C. Z. “Modeling application ICT R&D program of MSIT/IITP[2019-0-00260, Hyper- performance in docker containers using machine learning techniques,” Connected Common Networking Service Research Proceedings of 2018 IEEE 24th International Conference on Parallel and Infrastructure Testbed] Distributed Systems (ICPADS), pp. 1-6, Singapore, Singapore, Dec. 2018. [8] B. B. Rad, H. J. Bhatti, M. Ahmadi, “An introduction to docker and analysis of its performance,” International Journal of Computer Science REFERENCES and Network Security (IJCSNS), Vol. 17, No. 3, pp. 228, 2017. [1] D. Merkel, “Docker: Lightweight Linux Containers for Consistent [9] H. xiao, K. Rasul, R. Vollgraf, Fashion-mnist: a novel image dataset for Development and Deployment,” Journal of Linux Journal 2014. Vol benchmarking machine learning algorithm, 2014, No. 239, pp. 2, May. 2014 https://arxiv.org/abs/1708.07747 [2] Docker Docs, https://docs.docker.com/config/containers/resource_constraints/.

29 2021 International Conference on Digital Contents: AICo(AI, IoT and Contents) Technology (DigiCon-21) A Study of VR Interactive Storytelling using Planning

Su-ji Jang Byung-Chull Bae School of Games, Hongik University School of Games, Hongik University 2639 Sejong-ro, Jochiwon-eup, Sejong, South Korea 2639 Sejong-ro, Jochiwon-eup, Sejong, South Korea [email protected] [email protected]

Abstract—This paper proposes a VR interactive storytelling In this paper, we present an approach to generating VR content using planning which automatically generates an action interactive storytelling content in which the computer sequence to build a story. The generated content can provide two automatically generates an action sequence depending on the types of story endings depending on the player's choice. As a player's choice. story material, we adopt the well-known fairy tale "Little Red Riding Hood" to implement the content. The players can select the VR story’s point of view – either a first-person or a third- II. RELATED WORKS person perspective. We expect that the player’s different viewpoints can contribute to the player’s immersion and A. Story Generation using Planning emotional induction in the VR environment. Planning is an AI technique that automatically finds a sequence of actions to reach a goal state from an initial state. Keywords— Virtual Reality; Narrative; Interactive Storytelling; Several studies have intensively used planning for story Planning; Unity; Little Red Riding Hood; generation[9], including an intent-based planning approach focusing on the balance between character goals and plots[10], I. INTRODUCTION emotion-based planning for an emergent narrative generation With the recent development of artificial intelligence (AI) [11], and narrative control knowledge for real-time interactive and big data technology, AI-based automatic content creation storytelling[12]. is actively researched for economical and efficient mass production. For example, AI algorithms can generate movie B. Interactive storytelling scripts or write articles based on collected online information Interactive storytelling enhances immersion by providing [1][2]. In the game domain, due to the properties of multiple the player with choices for various story branches, which can sub-contents in a single game, studies on automatic content actively engage the player in the story[13]. Well-known creation include various aspects such as story, quest, level examples of interactive storytelling include Façade[14] and a design, and item creation[3][4]. commercial game, Reigns, by Devolver Digital[15]. Creating a good (or interesting) story is difficult even for Façade[14] is an AI-based interactive drama in which the humans. While there are many different elements to make player is invited by a couple to a dinner and then needs to good-quality or interesting stories, coherence is a key factor. In mediate a fight between the couple. The player can directly computational storytelling, as the story planner can find events influence the story progression by inputting text with the based on causal relations between story events, planning has keyboard. As the player’s text input affects the main story plot, been commonly used to generate coherent stories[5]. A recent the player's involvement is actively encouraged. data-driven approach with a Transformer language model utilizes common sense reasoning to enhance the coherence in Reigns[15] is a commercial card-style storytelling game in story generation[6]. In this paper, we present a planning-based which a player becomes a king in the Middle Ages and rules a approach to generating a coherent story on a VR platform. country. When a specific event is given, the player can choose the options depicted as left or right cards to advance the story. Virtual Reality(VR) content provides a higher level of Like Façade, the player choices affect the outcome of the event immersion than other 3D content, as actions are immediately and the final storyline. reflected in the game's visual field, such as the player's gaze or head movement. Also, players can freely explore the VR world C. VR storytelling content by wearing a Head Mounted Display (HMD) on their head and Storytelling research on VR content is essential, as it can interact with objects through the controller. This provides a provide players with various interactions and events, away high level of immersion as if the player became the main from the experience-based content that is mainly based on character[7]. Due to these immersive advantages, many studies hardware technology. Due to its immersive features, VR on VR content are actively conducted in various fields. In storytelling content often highlights the player’s vicarious particular, the VR devices can be successfully combined with experiences, such as preventing feral cat abuse by playing the the interactive storytelling technique for creating VR role of stray kittens [16] and escaping from a building in fire interactive storytelling content[8]. [17].

30 2021 International Conference on Digital Contents: AICo(AI, IoT and Contents) Technology (DigiCon-21)

III. IMPLEMENTATIONS We utilize Unity game engine (version 2019.3.11f1), Unity’s AI Planner Package (version 0.2.3), and SteamVR Asset to implement VR interactive storytelling content. We also provide object interaction for the immersive VR content using a controller (e.g., UI interaction, pick a flower, etc.).

A. Story Material and Flow As a story material, we adopt the well-known fairy tale ‘Little Red Riding Hood.’ We assume that the players who have less experience with VR can comfortably play our VR game with this familiar story content. Planning can find an optimal way to achieve a given goal. To create a coherent story using the Unity AI Planner, the following three elements need to be set appropriately. • Trait: defining the story domain • Goal State: describing the desired state at the end of the story • Actions: describing possible events to reach the goal Fig. 1. Story flow diagram based on AI agent actions. Accordingly, we partially adopt the original ‘Little Red Riding Hood’ to fit the VR interactive storytelling content with or more traits and attributes. As shown in Figure 2, we set the planning (e.g., remove the hunter character; add additional goal state as the story's ending by combining and defining story ending, etc.). various traits and attribute values. For example, the goal state of Crossroad Plan – the beginning of the story where Red The adapted story begins with Little Red Riding Hood Hood arrives at a crossroad, is defined as ‘Red-hood has a (referred to as Red Hood) carrying a basket and running an basket,’ ‘Red-hood's position equals the crossroad’ and ‘Wolf's errand to visit her granny. As the story progresses, Red Hood position equals the crossroad.’ If the plan is executed, the encounters a wolf at a crossroad in the forest. At this point, the option ‘1. Greet’ or ‘2. Ignore’ appears, as shown in Figure 2. player has to choose one of two options: ‘1. Greet’ or ‘2. The goal state of Sad Ending Plan – the wolf eats the granny, is Ignore’. As shown in Figure 1, our content has a multi-ending defined as ‘Red-hood has a flower,’ ‘Wolf's position equals structure in which the story ending changes depending on the granny's house.’ and ‘Red-hood's position equals granny's player's choices. house.’ Happy Ending Plan, which ignores the wolf and arrives If the player chooses ‘1. Greet’, Red Hood tells the wolf at the granny's house safely, is the same as Sad Ending Plan, that she is on her way to visit her granny. After knowing about but the condition ‘Red-hood has flower’ is removed. the Red Hood’s plan, the wolf recommends Red Hood to pick flowers from the garden. Red Hood accepts this and moves to the garden. Meanwhile, the wolf goes to the granny’s house before Red Hood and eats the granny. On the other hand, if the player chooses ‘2. Ignore’, the conversation between the wolf and Red Hood is omitted. The wolf leaves its path, and Red Hood arrives at the granny’s house safely.

B. Unity AI Planner Unity AI Planner can automatically generate storylines (i.e., plans) based on the actions of AI agents. The planner is mainly composed of three elements. First, a trait is the primary data used by the planner to define the problem to solve. Each trait consists of specific features of an object. For example, in our content, the main ending (problem definition) is that while Red Fig. 2. Example of the goal states of three story plans. Hood spends time in a flower garden, a wolf eats her granny. To create the corresponding story, the planner is required traits such as Red-hood, Wolf, Granny, Granny’s house, Flower, and The last is action. Actions refer to the plan steps that the AI Hunger. Finally, by combining the states of the traits, the agent performs to achieve a given goal. For example, every AI problem that represents the main ending is defined. agent performs a ‘Move To’ action to move to a specific place. In addition, each agent can have different actions. For instance, The second is the goal state that represents the end of the Red-hood can perform a ‘Pick Up’ action to collect flowers, story. It is defined as a conditional check of the values of one and Wolf can perform an ‘Eat’ action to satisfy its hunger.

31 2021 International Conference on Digital Contents: AICo(AI, IoT and Contents) Technology (DigiCon-21)

C. Point of View ACKNOWLEDGMENT Our content provides players with two points of view This work was supported by the National Research (POV). The primary is the third-person perspective, which Foundation of Korea(NRF) grant (2017R1A2B4010499, helps the players understand all the current story's information. 2021R1A2C1012377) funded by the Korea government(MSIT) Another is the first-person POV from Red Hood's perspective, hiding some information from the players. For example, if the players choose the Sad-Ending-Plan story from a third-person POV, they can see the wolf going to the granny’s house while REFERENCES Red Hood picking up flowers. We assume that a third-person [1] A. Brannan, “An In-Depth Analysis of Sunspring (2016), The Short POV can induce more suspense to the player by giving some Film Written By A Computer,” CineFiles Movie Reviews, June, 2016. critical information that is hidden to the main character (i.e., [2] L. Leppänen, M. Munezero, M. Granroth-Wilding, & H. Toivonen, Red Hood). “Data-driven news generation for automated journalism,” Proceedings of the 10th International Conference on Natural Language Generation, 2017. [3] A. Amato, “Procedural content generation in the game industry,” Game Dynamics. Springer, Cham, 15-25, 2017. [4] E. J. Hastings, R. K. Guha & K. O. Stanley, “Automatic content generation in the galactic arms race video game,” IEEE Transactions on Computational Intelligence and AI in Games, 1.4, 245-263, 2009. [5] Y. G. Cheong, M. O. Riedl, B. C. Bae, & M. J. Nelson, “Planning with applications to quests and story,” In Procedural Content Generation in Games (pp. 123-141). Springer, Cham, 2016. [6] X. Peng, S. Li, S. Wiegreffe, and M. Riedl, “Inferring the Reader: Guiding Automated Story Generation with Commonsense Reasoning,” In arXiv: 2105.01311, 2021. Fig. 3. Example of two different POVs [7] T. El-Ganainy, & M. Hefeeda, “Streaming virtual reality content,” arXiv preprint arXiv:1612.08350, 2016. [8] B. C. Bae, D. G. Kim, & G. Seo, “A study of vr interactive storytelling On the other hand, if the players choose the same story for empathy,” Journal of Digital Contents, 18(8), 1481-1487, 2017. from a first-person perspective, they can not see that the wolf is [9] J. Porteous, “Planning technologies for interactive storytelling,” In going to Granny’s house while picking up flowers. When the Handbook of Digital Games and Entertainment Technologies, Springer, players arrive at the Granny's house with the flowers, they will 2016. find out the wolf is pretending the granny after eating her. We [10] M. O. Riedl, & R. M. Young, “An intent-driven planner for multi-agent story generation,” In Proceedings of the Third International Joint assume that the first-person POV can evoke surprise in the Conference on Autonomous Agents and Multiagent Systems-Volume 1, player by deliberately hiding critical information. pp. 186-193, 2004. [11] R. S. Aylett, S. Louchart, J. Dias, A. Paiva, & M. Vala, “FearNot!–an IV. CONCLUSION experiment in emergent narrative,” In International Workshop on Intelligent Virtual Agents. pp. 305-316, Springer, Berlin, Heidelberg, This paper presents a planning-based interactive story 2005. generation for VR with two types of ending (Sad-Ending Plan, [12] J. Porteous, M. Cavazza, & F. Charles, “Applying planning to Happy-Ending Plan) depending on the player's choice. With interactive storytelling: Narrative control using state constraints,” ACM Unity AI Planner, we adapt the well-known fairy tale 'Little Transactions on Intelligent Systems and Technology (TIST), 1(2), 1-21, 2010. Red Riding Hood,’ implementing the VR story with two [13] J. Lebowitz, & C. Klug, Interactive storytelling for video games: A different POVs – the first-person perspective from Red Hood player-centered approach to creating memorable characters and stories, and the third-person perspective. We assume that the two Taylor & Francis, 2011. different points of view can induce dissimilar emotions (e.g., [14] M. Mateas, & A. Stern, “Structuring Content in the Façade Interactive suspense or surprise) due to information filtering. Drama Architecture,” In AIIDE, pp. 93-98, 2005. [15] Reigns(2016) [Internet]. Available: https://www.devolverdigital.com/- In future work, we plan to enhance the current story plans games/reigns by adding more story events with different endings. And then, [16] B. R. Jin, S. H. Kim, Y. S. Ju, M. S. Cha, & Y. W. Lee, “VR contents we will conduct a user study to investigate the player’s for Stray cat Experience for Prevent abuse of Stray cats,” Proceedings of immersion and emotions depending on the change of the the Korean Institute of Information and Commucation Sciences player’s POV. Conference, 24(1), 182-184. 2020. [17] K. S. Kim, & Y. H. Park, “A Study on the Development of a VR Game to Experience the Fire Evacuation and its Effect Analysis,” Journal of Korea Game Society, 20(2), 153-162. 2020.

32 2021 International Conference on Digital Contents: AICo(AI, IoT and Contents) Technology (DigiCon-21) An Emotion Detection based Models using Electroencephalogram (EEG)

Wajiha Ilyas Anum Moin Muzammil Noor Maryam Bukhari Department of Computer Department of Computer Department of Computer Department of Computer Science, COMSATS Science, COMSATS Science, COMSATS Science, COMSATS University Islamabad, Attock University Islamabad, Attock University Islamabad, Attock University Islamabad, Attock Campus, Pakistan Campus, Pakistan Campus, Pakistan Campus, Pakistan

Abstract—To build the best communication between Moreover, a fine decision tree and weighted KNN with 10 person to person or person to machine emotion plays an neighbors are also used. We use arousal and valance essential role. In this research, we propose a solution to detect quadrants to divide emotion into four classes as shown in human emotion using machine learning approaches with an Figure 1. The rest of the paper is organized as section II electroencephalogram (EEG). These emotions are classified into presents the related work , section III presents the proposed four different classes. In the first step, the data is preprocessed framework followed by results and conclusion. in which frequency bands are extracted from Power Spectral Density (PSD) and Discrete Wavelet Transform (DWT) with three statistical features, Hjorth parameters, and two statistical features were extracted. For the classification purpose, we use multi-layer perceptron, decision tree, and KNN to classify the emotion. The dataset used in this work is the DEAP dataset in which the EEG signal of 32 participants was recorded. There are a total of 40 videos present in the dataset (60 seconds of each video). The classification accuracy of four classes by using multi-layer perceptron is 77.6 %( sad), 79.1% (calm), 76.8(anger), and 66.4 (happy). Similarly, by using the Decision tree we have achieved the 78.9%, 79.4%, 77.2%, and 68.5% accuracy while by using KNN the accuracy values are 90.2%, 90.3%, 89.1%, and 86.5% respectively. Figure 1 Arousal and valance Quadrants Keywords— Decision Tree; EEG; Discrete Wavelet Transform; KNN II. RELATED WORK In the existing literature, most of the researchers use the I. INTRODUCTION DEAP dataset for emotion detection. By using discrete Brain-computer interface utilized the brain signals and wavelet transform they extract EEG frequency bands and combine it with artificial intelligence to perform some task extract other features like time domain, power, energy, like emotion detection and stress detection. An differential entropy, and Hurst exponential to classify electroencephalogram (EEG) is a test in which electrodes are four classes of emotions. Further, they are using attached to the skull to records the activities and electric traditional machine learning classifiers such as SVM to impulses of the neuron. The history of brain-computer classify data with an accuracy of 79%, 76%, 77% & 74%. interfaces (BCI) developed from a mere idea in the days of Moreover, a fusion of channels with a channel-wise early digital technology to today's highly sophisticated SVM classifier seems to improve accuracy. Khateeb et approaches for signal detection, recording, and analysis. [1] al. [2] use time domain, frequency domain, and wavelet Emotion recognition is a trending topic in the field of domain features followed by a concatenation of all the physiology, research, and health care to recognize the mental features. They give these extracted features to SVM to condition of the patient. People can express their emotion classify nine emotions (happy, pleased, relaxed, excited, through facial emotion, speech or non-verbal behavior except neutral, calm, distressed, miserable, and depressed). In in some disorder cases like Alexithymia in which patient has their work, SVM with 10-folds and leave-one-out cross- difficulty to express their emotion. In this paper, we are using validation techniques are used, and thus obtained EEG signals from the DEAP dataset to classify four types of accuracy is 65.92%. This accuracy was further improved emotions i-e happy, sad, anger, and calm. After applying by using wavelet entropy and Hjorth parameters. Sexana preprocessing techniques, four frequency bands are extracted et al.[3] uses the Fast Fourier transform (FFT) and by using Power Spectral Density (PSD), three statistical Principal Component Analysis (PCA) for the process of features from Discrete Wavelet Transform, Hjorth feature extraction. Further, they are using K-nearest parameters, and two statistical parameters. For classification, neighbor (KNN) to classify the data. Seal et al. [4] record we use a multi-layer perceptron with 10 fold cross-validation. the EEG signal of 44 volunteers (23 female, 21 male)

33 2021 International Conference on Digital Contents: AICo(AI, IoT and Contents) Technology (DigiCon-21)

while watching 12 videos (3 videos on each subject) and classes' i-e happy, sad, calm, and anger by using the then classify four emotional states i-e happy, fear, sad DEAP dataset. In the first step, we preprocessed the raw and neutral. In their work, the Discrete Wavelet signals. Then features are extracted by using Power transform is also used to extract sub-bands and for the Density Spectrum, Discrete Wavelet Transform, Hjorth process of classification and channel selection, Extreme parameters, and two statistical features. Afterward, we Learning Machine (ELM) is deployed [5]. Their work combine all the features and then classify them by using starts from using a two-stepped majority voting approach one vs one approach. Moreover, we use Multi-layer to classify two binary classes' i-e high arousal vs low perceptron, decision tree, and KNN for classification. arousal and high valence vs low valence. Wavelet-based The overview of the proposed methodology is shown in entropy and fractal dimension features are extracted first Figure 2. followed by the KNN algorithm. The top five channels A. Acquire Signal: to get the prediction for EEG in majority voting. All the We are using the DEAP dataset [6] in this research. There EEG rhythmic predictions are used in the second stage are 32 participants with an average age of 27 (between 19 to get the final predictions. They achieved 85% accuracy to 37) who are watching videos. Their emotions were for the first binary class and 86.3% accuracy for the recorded by using a BCI headset. In this experiment, the second binary class respectively. total participants consist of 50% male and 50% female. Each participant is watching 40 videos of 60 seconds.

III. METHODOLOGY In this research, machine learning algorithms with a one vs one approach are used to classify emotion into four

FIGURE 2: Flowchart of proposed Work

The feature extraction process is done by using Power Spectral Density (PSD), discrete wavelet transform (DWT), B. PREPROCESSING: Hjorth parameter (activity, mobility, complexity), and two There are many artifacts present in raw signals. To statistical features (skewness, kurtoses). EEG signals are non- remove these artifacts first we downsample the data in linear and non-stationary signals. By using PSD we extracted 128Hz to remove the EOG artifact. Then a bandpass four frequency bands that is an alpha, delta, beta, and theta. frequency filter of 4-45.0Hz is applied to the data. DWT uses both time and frequency domains. Afterward, data was segmented into 60 seconds trials. For Statistical features energy, entropy, and standard The size of data is 40*40*8064(video/trails*EEG deviation are used. Moreover, for statistical features, we channels*data). To generate emotions cerebral laterality, make 10 wave segments for each trail of individual the frontal cortex, left and right hemispheres play a participant then decompose data into wavelets and compute significant role. So we select 32 channels to get arousal wavelet energy, entropy, and standard deviation. The range and valance values because they are placed in that area of frequency in EEG frequency bands is shown in Table II. which plays important role in emotion generation. C. Feature Extraction: D. Classification: There are two classifiers are used in this work. One is a neural network while the second is a Decision tree. The features

34 2021 International Conference on Digital Contents: AICo(AI, IoT and Contents) Technology (DigiCon-21)

(energy, entropy, standard deviation, alpha, beta, delta, theta, root node, it selects the best attribute then moves to the activity, mobility, complexity, skewness, kurtoses) are given next subtree and selects the best tree from the remaining to the classifiers. Classes are identified by using values of attributes. This process is repeated until its all attribute arousal and valance. places in its right position in the tree. A simple pictorial a) Multi-layer Perceptron: representation of the Decision tree is shown in Figure 4. A multi-layer perceptron is an artificial neural network belong to a class of feedforward neural networks. It consists of three major layers that are input, output, and hidden layer. In the first original step MLP algorithm take inputs, then it takes the dot product of the weight matrix with the given input followed by some activation function to produce the output. The activation function used in this work is ReLU. At the end of each iteration, weights associated with each node are updated using the backpropagation algorithm. This whole process is repeated until the set number of epochs. A simple MLP architecture is shown in Figure 3.

Figure 4: Decision Tree There are two Attribute selection measures in the decision tree. One is information gain and the other is the Gini index. The mathematical formulas are shown below: | | ( , ) = ( ) ( ) (1) ( ) | | 𝑆𝑆𝑣𝑣 = (𝑣𝑣∈)𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉 𝐴𝐴 𝑣𝑣 (2) 𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺 𝑆𝑆 𝐴𝐴 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 𝑆𝑆 − ∑ � 𝑆𝑆 � 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 𝑆𝑆 =𝐶𝐶 1 ( ) (3) 𝑖𝑖=1 𝑖𝑖 2 𝑖𝑖 In𝐸𝐸𝐸𝐸 the𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 above∑ equations,−𝑃𝑃𝐶𝐶 ∗ 𝑙𝑙𝑙𝑙𝑙𝑙 2the𝑃𝑃 proportion of belonging to 𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺 𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼 − ∑𝑖𝑖=1 𝑃𝑃𝑖𝑖 class is represented by where represents the different values in the target attribute. Moreover,𝑆𝑆 the 𝑝𝑝𝑖𝑖 term values𝑖𝑖 (A) represent the total possible𝐶𝐶 values for attribute while the denotes the subset of collection Figure 3: Multi-Layer Perceptron (MLP) for which values for attribute . 𝑣𝑣 𝐴𝐴 𝑆𝑆 b) K-nearest neighbor (KNN): 𝑆𝑆 𝑣𝑣 IV. RESULTS𝐴𝐴 KNN is a supervised machine learning algorithm. It is For the experiments, we perform the 10 fold cross- used both for classification and regression problems. It validation with a multi-layer perceptron. All features are predicts a new value by calculating distance with all data given as input to the MLP. The average accuracy we points present in the training set. KNN is easy to achieved from all classes is 75%. Similarly, we also implement but it is slow for larger datasets. The major perform experiments with weighted KNN. The value of steps of the algorithm are described below: K is set to 10. The distance metric used here is Euclidian. • Given the input dataset, first, select the number of The average accuracy of all classes in this experiment is neighbors which is K. This value must be an integer. 89%. Moreover, we also used a decision tree with 100 • Calculate the distance between query data points in maximum number of splits to perform the same the test set with all instances of training data. The experiment. Gini’s diversity index is used as a split distance metrics include the hamming, Euclidean, criterion in the decision tree. The average accuracy of all and Manhattan distance. classes in this experiment is 76% respectively. The class • Sort the resulting values of distances in ascending by class accuracies for all classifiers is shown in Figure order. 5. Furthermore, the raw and preprocessed EEG is shown • Pick for K rows in sorted values. in Table I. • Based on selected data points, pick the most frequent class and assign it to the query data point.

c) Decision Tree: A decision tree is also a supervised machine learning algorithm use to solve both classification and regression problems. It has nodes and leafs. A node represents attributes and leafs represents the class labels. It starts with the root node and compares the value of each attribute with the root attribute. Based on that Figure 5: Results in Pictorial Form comparison tree is split or move to the next node. For the

35 2021 International Conference on Digital Contents: AICo(AI, IoT and Contents) Technology (DigiCon-21)

TABLE I: RAW AND PREPROCESSED SIGNAL Raw signal Preprocessed signal

TABLE II: EEG FREQUENCY BANDS Waves Frequency Behavior Signal waveform band trait (EEG frequency) Delta 0.3-4 Deep sleep

Theta 4-8 Deep meditation Alpha 8-13 Eye closed but awake Beta 13-30 Eye open and thinking Gamma 30-above Unifying consciousness

V.CONCLUSION In this paper we use the DEAP dataset to classify emotion into four classes (happy, sad, relax and calm). We use one vs REFERENCES: one approach and then calculate the average accuracy of each classifier. We extract frequency bands for an alpha, beta, [1] Kübler, A., The history of BCI: From a vision for the future to real support for personhood in people with locked-in syndrome. delta, and theta using power density spectrum, entropy, and Neuroethics, 2020. 13(2): p. 163-180. energy standard deviation by using discrete wavelet transform. All the extracted features are concatenated and [2] Khateeb, M., S.M. Anwar, and M. Alnowami, Multi-Domain given as an input to the classifiers. We perform 10 fold cross- Feature Fusion for Emotion Classification Using DEAP Dataset. IEEE Access, 2021. 9: p. 12134-12142. validation with MLP. Moreover, the accuracy results achieved from all of our used classifiers are 75.05% using [3] Saxena, A., et al. Emotion Detection Through EEG Signals MLP, 76.03% using KNN, and 75.8% using a decision tree. Using FFT and Machine Learning Techniques. 2020. In the Future, we will extend this work with more advanced Singapore: Springer Singapore. techniques of deep learning. [4] Seal, A., et al., An EEG Database and Its Initial Benchmark Emotion Classification Performance. Computational and TABLE III. ACCURACY RESULTS OF ALL CLASSIFIERS Mathematical Methods in Medicine, 2020. 2020: p. 8303465.

Classes Multi-layer KNN Decision [5] Ismael, A.M., et al., Two-stepped majority voting for efficient perceptron tree EEG-based emotion classification. Brain Informatics, 2020. Happy 66.4% 86.5% 68.5% 7(1): p. 9. Sad 77.6% 90.2% 78.9% [6] Koelstra, S., et al., DEAP: A Database for Emotion Analysis; Anger 76.8% 89.1% 77.2% Using Physiological Signals. IEEE Transactions on Affective Calm 79.1% 90.3% 79.4% Computing, 2012. 3(1): p. 18-31. Average 75% 89% 76%

36 2021 International Conference on Digital Contents: AICo(AI, IoT and Contents) Technology (DigiCon-21) A Trustable and Traceable Method for Secondhand Market based on Blockchain Technology with Committee Consensus

Dongkun Hou1, Yinhui Yi1, Jie Zhang1 and Ka Lok Man1-5 1 Xi’an Jiaotong-Liverpool University, Suzhou, China 2 Swinburne University of Technology Sarawak, Malaysia 3 KU Leuven, Belgium 4 Kazimieras Simonavicˇius University, Lithuania 5 Vytautas Magnus University, Lithuania

{Dongkun.Hou20, Yinhui.Yi20}@student.xjtlu.edu.cn {Jie.Zhang01, Ka.Man}@xjtlu.edu.cn

Abstract—The secondhand market is promising in recent years, from the third-party platform, the cost may be higher than but the trust problem hindered its development. Many research buyers expect. solves the problem by third-party intermediaries, and the trust In the secondhand market, the platform reputation is the key is from third-party platform reputation. Some consumers, how- ever, have no idea to recognize the qualification of secondhand factor affecting consumer purchase decisions and the financial products, and it is difficult to safeguard rights if the third-party performance of the company. Some research solves the trust platform is malicious. In this paper, consortium blockchain problem by improving third-party reputation. Thakur technology is employed to solve these problems, and a novel [4] and Davide et al. [5] applied empirical analysis to research committee based Byzantine consensus is designed in the the impact of managers’ response on their online reputation. secondhand market. Committee nodes can validate published transactions, other client nodes can also check the recorded Resnick et al. studied the relationship between feedback and transactions and update product states. The recorded reputation systems in eBay and other shopping [6]. transactions need to be validated by many committee nodes, and Some information was proposed on how to build the reputation these transactions are transparent, permanent and traceable. of the seller, thereby transforming reputation into trust. Huynh Thus, buyers and sellers can trust the blockchain based et al. developed a trust and reputation platform integrating secondhand market. Moreover, comparing with a public blockchain, our consortium blockchain based secondhand many sources to comprehensively evaluate the possible market system can be more secure and efficient. performance of agents [7]. These methods are based on the Index Terms—Secondhand Market, Consortium Blockchain, trust of third-party platform, but safeguarding long-term Trust Problems, Committee based Byzantine Consensus credibility is difficult, because it is possible the third-party Mechanism platform is firstly reliable and then become malicious. Blockchain as a distributed ledger can reduce the trust I. INTRODUCTION problem caused by asymmetric product quality, and it has also The aspect of circular economy is famous in recent years. been widely employed in many fields [8]. For example, Consumers are encouraged to reuse, recycle and resell some Blockchain is employed to share and protect healthcare useless but still serviceable products in their lives, which management data [9]. Patient’s disease records are uploaded to boosts models of secondhand market [1]. Additionally, with the blockchain to provide sharing, accessibility, security, cost the development of the Internet, the secondhand market is a reduction and traceability. Blockchain can provide security, popular trade channel. According to a survey by GMI [2], a transparency, speed and cost reduction for supply chain and European marketing company, about 70% of consumers have trade [10]. Supply records of goods and transactions can be bought secondhand products. However, the secondhand market stored on the blockchain for better tracking and verification. suffers from some trust problems and hinders its development. In addition, Shen et al. studied the value of blockchain in Comparing to products sold by factories and shops directly, disclosing the quality of secondhand products in the supply the quality of secondhand products cannot be guaranteed [3]. chain. Under the blockchain, suppliers and consumers can The quality of products cannot always fulfill buyers’ achieve a win-win situation [11]. expectations, and the sellers may replace products with In this project, therefore, consortium blockchain technology counterfeits or substandard products. In most cases, sellers are is introduced to cope with trust problems in the secondhand experts in the area of products, while the buyers do not have market. The committee based Byzantine consensus mechanism enough knowledge to evaluate the quality of products. If is developed instead of common consensus algorithms [12]. seeking help

37 2021 International Conference on Digital Contents: AICo(AI, IoT and Contents) Technology (DigiCon-21)

The secondhand product information and transactions are transparent, untamable ledger is employed for marking the stored in the blockchain network, and they are transparent to record of the secondhand products. This part introduces the all of the clients. Moreover, the characteristics of blockchain, architecture of this system, and then describes a novel such as transparency and temper-resistance, can help buyers to committee consensus mechanism based on the secondhand look up the transaction history of a secondhand product and market. Finally, the process of the use case is illustrated. find the specific sellers with misbehaviors. This report will firstly introduce the related blockchain A. Architecture technology of the project in Section II. Section III will describe the detailed design of this system. Then Section IV will discuss the novel system. The final section will conclude the limitation of the system and future works.

II. BLOCKCHAIN TECHNOLOGY Blockchain technology as a distributed ledger combines many computer technologies to guarantee its security contain- ing cryptography, peer-to-peer, distributed storage, consensus algorithms, digital signature and smart contracts. Blockchain is also transparent, traceable, tamper-proof and anonymous [13]. A consensus mechanism can help to reach an agreement among most of the members on the blockchain with respect to record transactions and update the ledger [14]. Generally, a Fig. 1. Architecture of blockchain based secondhand market system. consensus algorithm connects large amounts of nodes to make them work together and cope with conflicts and also enables Blockchain can replace the second-hand market the blockchain to have the capability of fault tolerance. intermediary. The function of a third-party intermediary can Blockchain can be divided into public blockchain, private be realized through the smart contract, and the transactions blockchain and consortium blockchain [15]. Everyone on the can also be recorded in the blockchain. The structure of the network can access the public blockchain; therefore, anyone system is shown in Fig. 1. Client nodes, including buyers and can participate in the consensus and append blocks. A private sellers, can send the information and transaction to a blockchain is centralized, and can be controlled by one committee node in the secondhand market system. The organization or person. Consortium blockchain is controlled uploaded information needs to be verified by the committee by many super nodes. The blocks can be appended if some node by the client’s invoices or photos, and transferred to other super nodes confirm them. committee nodes if the information is correct. Additionally, In a public blockchain, a proof-based consensus algorithm is the InterPlanetary File System (IPFS) should be employed to employed such as Proof of Work (PoW). Nodes should prove create IPFS hash values of the invoice and photos in smart their ability to get the right to append blocks to the chain, and contract [17]. blocks appended to blockchain need to be checked by all nodes When sellers want to publish a new secondhand product in in the network. In consortium blockchain, blocks are verified blockchain, they need to initial the product information and appended by super nodes, so it has more secure, and can including date of purchase, the number of trades, quality of also improve the communication efficiency and reduce power- products, expected selling price, etc. Then, they can publish consuming by employing Practical Byzantine Fault Tolerance the product in a blockchain network, and wait for the (PBFT) or Delegated Proof of Stake (DPoS) [16]. confirmation of the information by committee nodes. After a However, PoW needs a long time to confirm the generated buyer pays for the product, the seller can send the products to blocks and wastes more computing resources. PBFT can the buyer, and the seller needs also to upgrade the tolerate up to l/3 of the malicious participants and confirm secondhand product state. transactions quickly, but its efficiency will also be reduced if Buyers can view the published product information such as there are many participating nodes. In this paper, considering date of purchase, number of trades, quality of products, etc. the cost of computation and communication, we design a Buyers need to contact sellers if they want to purchase their committee-based Byzantine consensus mechanism, and select products. The buyer should pay for the product and change the a committee to represent all participants to agree on a product state in blockchain, and then the buyer should also consensus. In a stable state, the committee nodes apply PBFT modify the state once they receive the product. to reach a consensus. In the election state, the nodes with B. Committee Consensus more transactions become the next committee nodes. The consensus mechanism is to give the node the right to III. SYSTEMATIC DESIGN append the blocks in the blockchain. Considering the cost of The blockchain network of the secondhand market consists consensus computing and communication, we propose an of committee nodes and client nodes. Blockchain as a public, efficient and secure committee based consensus mechanism, which needs to be checked before adding transactions to the blockchain.

38 2021 International Conference on Digital Contents: AICo(AI, IoT and Contents) Technology (DigiCon-21)

In this case, some honest nodes will form a committee to And then the seller needs to upgrade the secondhand check the transaction and block generation. At the same time, product state as “sent”. the client nodes, including the buyer and the seller, sends the 6) Verify express. Committee nodes should verify the up- status information of the product to the committee nodes. The loaded sent state by photos or express record. committee then verifies the information. Only information with 7) Receive product. The buyer should also modify the state sufficient proof can be packaged on the blockchain. as “received” once they receive and check the product. The committee nodes will be changed based on the number In addition, if the product is damaged, the buyer can also of client node transactions. At the beginning of the next round reject the product, change the product state to “reject” of elections, the clients with the most transactions will become and submit the evidence to committee nodes. the new committee. The committee will not be the same to 8) Twice selling. The buyers can change the product state guarantee the security of blockchain. Additionally, PBTF is as “selling” if they want to sell the secondhand product employed in the committee nodes. It means that committee again. nodes have l/3 fault-tolerant of Byzantine nodes. IV. DISCUSSION C. Process of transaction Compared with the public blockchain, in the consortium, only a few nodes will verify the transactions instead of broad- casting to each node and reaching a consensus, so this system will be more efficient. During the selection of committee nodes, the smart contract selects the nodes with a higher transaction numbers to form the next round of committee nodes. Because these nodes are highly dependent on the system, so they are not malicious. Moreover, PBTF can allow l/3 fault-tolerant. In addition, even if users can easily track product information and the number of item transactions in the blockchain, the delivery of products is still through offline delivery. When sellers and buyers need to update the product states on the blockchain, committee nodes must check carefully the correctness of updated information by their photos or invoices. Besides, our system security depends on the PBTF consensus mechanism of consortium blockchain. It Fig. 2. Use case in blockchain based secondhand market system. is also risky if malicious participants with more transactions become committee nodes, and their numbers are more one Client nodes include buyers and sellers in the network. They third of the total committee nodes. have the capability to send the transaction data in blocks. Fig. It is worth noting that the verification information is a key 2 describes a sequence of transactions in blockchain based component of the committee mechanism. Therefore, some secondhand market system. external technologies can also be applied for auxiliary 1) Create product. Buyers create a secondhand product verification, such as OCR invoice recognition [18]. information, and publish it in blockchain network. They need to initial the product information including date of V. CONCLUSION purchase, the number of trades, etc. quality of products, The secondhand market is promising in recent years, but the expected selling price, etc. trust problem of it hindered its development. In this paper, a 2) Verify product. Committee nodes need to verify the sub- consortium blockchain based secondhand market is designed mitted secondhand information by photos or invoices. to solve these problems, and a novel committee based 3) View and pay for product. Buyers can view the published Byzantine consensus is developed. Committee nodes can product information such as date of purchase, number of validate published transactions, other client nodes can also trades and quality of products, etc. Buyers need to check the recorded transactions and update product states. contact sellers if they want to purchase their products. Comparing with a public blockchain, the consortium The buyer should pay for the product and change the blockchain based secondhand market system can be more product state to “paid” in blockchain if both agree on secure and efficient. the price. 4) Verify payment. Committee nodes should verify the VI. ACKNOWLEDGE submitted product payment information by photos or Jie Zhang wishes to thank the National Natural Science others. Foundation of China under Grant No. 62002296; the Natural 5) Send product. The seller can send the products to the Science Foundation of Jiangsu Province under Grant No. buyer if the payment is confirmed in the blockchain. BK20200250; the XJTLU Key Programme Special Fund under Grant No. KSF-E-54. Ka Lok Man wishes to thank the

39 2021 International Conference on Digital Contents: AICo(AI, IoT and Contents) Technology (DigiCon-21)

AI University Research Centre (AI-URC), Xi’an Jiaotong- Liverpool University, Suzhou, China, for supporting his related research contributions to this article through the XJTLU Key Programme Special Fund (KSF-E-65) and Suzhou-Leuven IoT & AI Cluster Fund.

REFERENCES [1] K. T. Hansen and J. Le Zotte, “Changing secondhand economies,” 2019. [2] M. Chahal, “The second-hand market: What consumers really want to buy,” Marketing Week. Oct, vol. 21, 2013. [3] A. G. Fernando, B. Sivakumaran, and L. Suganthi, “Comparison of perceived acquisition value sought by online second-hand and new goods shoppers,” European Journal of Marketing, 2018. [4] S. Thakur, “A reputation management mechanism that incorporates ac- countability in online ratings.,” Electronic Commerce Research, vol. 19, no. 1, pp. 23 – 57, 2019. [5] D. Proserpio and G. Zervas, “Online reputation management: Estimating the impact of management responses on consumer reviews.,” Marketing Science, vol. 36, no. 5, p. 645, 2017. [6] R. Paul and Z. Richard, Trust among strangers in internet transactions: Empirical analysis of eBay’ s reputation system. Emerald Group Publishing Limited, 2002. [7] T. D. Huynh, N. R. Jennings, and N. R. Shadbolt, “An integrated trust and reputation model for open multi-agent systems.,” Autonomous Agents and Multi-Agent Systems, vol. 13, no. 2, p. 119, 2006. [8] V. Babich and G. Hilary, “What operations management researchers should know about blockchain technology,” SSRN Electronic Journal. Available at: http://dx. doi. org/10.2139/ssrn, vol. 3131250, 2018. [9] D. Metcalf, J. Bass, M. Hooper, V. Dhillon, and A. Cahana, Blockchain in healthcare : innovations that empower patients, connect professionals and improve care. Taylor Francis, 2019. [10] N. Vyas, A. Beije, and B. Krishnamachari, Blockchain and the supply chain : concepts, strategies and practical applications. Kogan Page, 2019. [11] B. Shen, X. Xu, and Q. Yuan, “Selling secondhand products through an online platform with blockchain,” Transportation Research Part E: Logistics and Transportation Review, vol. 142, p. 102066, 2020. [12] Y. Meng, Z. Cao, and D. Qu, “A committee-based byzantine consensus protocol for blockchain,” in 2018 IEEE 9th International Conference on Software Engineering and Service Science (ICSESS), pp. 1–6, 2018. [13] Q. Feng, D. He, S. Zeadally, M. K. Khan, and N. Kumar, “A survey on privacy protection in blockchain system,” Journal of Network and Computer Applications, vol. 126, pp. 45–58, 2019. [14] S. Nakamoto, “Bitcoin: A peer-to-peer electronic cash system,” tech. rep., Manubot, 2019. [15] D. Mingxiao, M. Xiaofeng, Z. Zhe, W. Xiangwei, and C. Qijun, “A re- view on consensus algorithm of blockchain,” in 2017 IEEE international conference on systems, man, and cybernetics (SMC), pp. 2567–2572, IEEE, 2017. [16] Z. Li, J. Kang, R. Yu, D. Ye, Q. Deng, and Y. Zhang, “Consortium blockchain for secure energy trading in industrial internet of things,” IEEE transactions on industrial informatics, vol. 14, no. 8, pp. 3690– 3700, 2017. [17] M. Steichen, B. Fiz, R. Norvill, W. Shbair, and R. State, “Blockchain- based, decentralized access control for ipfs,” in 2018 IEEE International Conference on Internet of Things (iThings) and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Smart Data (SmartData), pp. 1499– 1506, IEEE, 2018. [18] H. T. Ha, Z. Neveˇˇrilova´, A. Hora´k, et al., “Recognition of ocr invoice metadata block types,” in International Conference on Text, Speech, and Dialogue, pp. 304–312, Springer, 2018.

40 2021 International Conference on Digital Contents: AICo(AI, IoT and Contents) Technology (DigiCon-21) A Feasibility Study of a Security Mechanism for Blockchain-Powered Intelligent Edge

Jie Zhang1, Ka Lok Man1-5, Young B. Park6, Jieming Ma1, and Xiaohui Zhu1 1Xi’an Jiaotong-Liverpool University, Suzhou, China 2Swinburne University of Technology Sarawak, Malaysia 3imec-DistriNet, KU Leuven, Belgium 4Kazimieras Simonavicius University, Lithuania 5Vytautas Magnus University, Lithuania 6Department of Software Science and Engineering, Dankook University, Youngin, S. Korea {Jie.Zhang01, Ka.Man, Jieming Ma, Xiaohui Zhu}@xjtlu.edu.cn [email protected]

Abstract— Blockchain-powered intelligent edge is a new Although a wide range of security solutions based on paradigm which integrates blockchain and edge computing into cryptography has been playing a significant role in many the Intelligent Internet of Things (IIoT). It enables the edge- scenarios of IoT, none of existing schemes can effectively devices to process data for end-devices on behalf of the cloud address the following issues in blockchain-powered intelligent center, and it also allows the cloud center to trace data via edge. blockchain. It is expected to address some obstinate problems in many IoT applications, such as data management in intelligent First, side-channel attacks are increasingly threatening the building, data privacy in e-health, response delay in self-driving cloud servers, edge-devices and end-devices. Long-term shared cars, etc. secret keys or private keys underlies the security of cryptography-based security solutions. Most cryptographic However, security remains a significant problem which has not primitives in use assume that the long-term secrets are secure been extensively studied so far in the blockchain-powered from attacks. However, side-channel attacks can leak the long- intelligent edge. This paper presents a feasibility study of a novel term secrets in physical manners [7-9]. The situation is security mechanism for the blockchain-powered intelligent edge. increasingly severe as more and more side- channels [10-12] are The security mechanism aims to three specific issues: (1) side- exploited; and machine learning techniques are used to analyze channel (or leakage) attacks, (2) quantum attacks, and (3) heavy side- channel information [13-15]. computing burden. It consists of a suit of leakage-resilient, post- quantum and lightweight security algorithms and protocols, Second, many of the public-key cryptosystems in use will be including public-key encryption (PKE) algorithms, digital broken if large-scale quantum computers are ever built. Public- signature algorithms and authenticated key agreement (AKA) key cryptosystems are based on the intractability of protocols. mathematical problems. However, quantum computers exploit quantum mechanical phenomena to solve mathematical Keywords—post-quantum cryptography, leakage-resilient problems that are difficult or intractable for conventional cryptography, blockchain, intelligent edge computers [16]. This will seriously compromise the confidentiality and integrity of digital communications in the I. INTRODUCTION blockchain-powered intelligent edge. Recent research on IIoT emphasizes the significance of the Finally, many end- and edge-devices are constrained in intelligent edge layer. As the middle layer between the cloud terms of resources, energy and computing capability due to the layer and the end layer, intelligent edge can undertake cost and size consideration. However, advanced cryptographic computing tasks for end-devices on behalf of the cloud center. algorithms-based security solutions will require high computing Compared with traditional IoT paradigm which uploads data tasks and will overburden these devices. from end-devices to the remote cloud center for processing, the new paradigm has a number of advantages, such as reducing response time, alleviating security and privacy issues during II. BACKGROUND uploading, and so on. However, it also brings the problem for the cloud center to trace data. Under this background, blockchain A. Side-channel attacks and leakage-resilient cryptogrpahy is introduced into the intelligent edge layer to enable traceability. Side-channel attacks are increasingly threatening The integration of blockchain and intelligent edge is studied in cryptographic algorithms. The first side-channel attack occurred some recent works [1-6] which have established the framework in 1956 when the British intelligence organization, MI5, broke a of blockchain-powered intelligent edge. cipher used by the Egyptian Embassy in London using information leaked through the sound side channel. The However, the security problem in the blockchain-powered academic study of side-channel attacks started formally in late intelligent edge has not been specifically studied so far. 1990s [18]. After more than 20 years, research on side-channel

41 2021 International Conference on Digital Contents: AICo(AI, IoT and Contents) Technology (DigiCon-21) attacks has become a significant topic in cryptography. More (1) Leakage-resilient cryptographic algorithms: Leakage and more side-channels are proposed [19-29], threating most resilient methods will be investigated and applied into existing symmetric cryptographic algorithms, public-key cryptographic cryptographic algorithms. These methods split a long-term algorithms and key agreement protocols in use. secret into several secret states which will be stored in separate memories, independently used in computations, and refreshed To resist side-channel attacks, a number of leakage-resilient after use. The original long-term secret will be destroyed, and methods are proposed, and leakage-resilient cryptography is the leakage of secret states will not compromise the secrecy of developed. Shin et al. [30] proposed two password-based and the long-term secret. leakage-resilient key exchange protocols using [ , ] and [2,2] threshold secret sharing. Ruan et al. [31] presented a leakage- (2) Post-quantum public-key cryptographic algorithms: resilient and password-based Diffie-Hellman 𝑛𝑛key𝑛𝑛 exchange Public-key cryptosystems based on Supersingular Isogeny will protocol. Ruan et al. [32] proposed a leakage-resilient identity- be designed and developed. Their security is based on the based key exchange protocol. However, these available schemes difficulty of finding isogenies between supersingular elliptic are not applied in practice due to their heavy cost, and most key curves. The fastest known quantum attack to this problem exchange protocols in use, such as the protocols in Transport remains exponential, which means public-key cryptographic Layer Security (TLS) 1.3, Bluetooth 5.0, 5G etc., still cannot algorithms based on this problem is quantum-resistant. resist side-channel attacks. (3) Lightweight methods for leakage-resilient and post- quantum cryptographic algorithms: Elliptic Curve B. Quantum attacks and post-quantum cryptography Cryptographic (ECC) algorithms and computing-unbalancing Quantum computing is threatening the security of the public- methods will be investigated and applied into the leakage- key cryptosystems currently in use. Therefore, since 2016, NIST resilient and post-quantum cryptographic algorithms. ECC has initiated a process to solicit, evaluate, and standardize algorithms can achieve high security with short secret keys and quantum-resistant public-key cryptographic algorithms, thereby are friendly for capability-constrained end and edge- including public-key encryption, key establishment, and digital devices in the blockchain-powered intelligent edge signature. The Round 3 candidates were announced July 22, environments. The computing-unbalancing methods can 2020. In this round, Supersingular Isogeny Key Encapsulation transfer some computing tasks from a constrained device to a (SIKE) is selected as one of the alternate candidates. It is based more powerful one. The combination of these two methods with on the Supersingular Isogeny Diffie-Hellman (SIDH), which the leakage-resilient and post-quantum cryptography algorithms was introduced by Jao and De Feo in 2011. The proposed will lead to a security mechanism with strong security and high security mechanism will use the supersingular isogeny method efficiency. to realize quantum-resistance. Take the following AKA protocol as an example, we briefly illustrate how to design a leakage-resilient, post-quantum and C. Unbalancing computation in cryptographic protocols lightweight AKA protocol using the above algorithms and There are two commonly used methods to realize methods for the security mechanism: lightweightness of cryptographic protocols. The first one is offline computation which allows the weak party pre-compute Denote the base point of elliptic curve group by , the client some values that will be used during the protocol execution. by , the server by , long-term private and public keys of and Wong and Chan [33] proposed a lightweight key agreement by , , , . 𝑃𝑃 protocol that allows the initiator pre-compute some value, so the 𝐶𝐶 𝑆𝑆 𝐶𝐶 The 𝐶𝐶original𝐶𝐶 AKA𝑆𝑆 protocol𝑆𝑆 works as follows: online computing time will be reduced. However, this method 𝑆𝑆 𝑆𝑆𝐾𝐾 𝑃𝑃𝐾𝐾 𝑆𝑆𝐾𝐾 𝑃𝑃𝐾𝐾 has three weakness. First, it is only suitable for scenarios where Step 1. : the initiator is less powerful than the responder. Second, the where = +𝐶𝐶 for a random value overall computation is not changed. Finally, the pre-computed 𝐶𝐶 →𝑆𝑆 𝑈𝑈 secret values have to be stored in the memory of the device and 𝐶𝐶 :𝐶𝐶 , 𝐶𝐶 𝐶𝐶 Step 2.𝑈𝑈 𝑅𝑅 𝑆𝑆𝐾𝐾 𝑅𝑅 thus are vulnerable to ephemeral-secret-leakage attacks [34-36]. = +𝑆𝑆 𝑆𝑆 where 𝑆𝑆 →𝐶𝐶 𝑈𝑈 𝑆𝑆𝑆𝑆𝑔𝑔 for a random value , is the The second method is unbalancing computations among the digital signature for string || || under , is the - two parties. It transfers some time-consuming computations coordinate 𝑈𝑈of𝑠𝑠 =𝑅𝑅𝑆𝑆 𝑆𝑆𝐾𝐾( 𝑆𝑆 ) 𝑅𝑅𝑆𝑆 𝑆𝑆𝑆𝑆𝑔𝑔𝑆𝑆 from a weak party to its more powerful communicating partner. 𝑈𝑈𝐶𝐶 𝑈𝑈𝑆𝑆 𝐾𝐾𝑥𝑥 𝑆𝑆𝐾𝐾𝑆𝑆 𝐾𝐾𝑥𝑥 𝑥𝑥 : 𝑆𝑆 𝐶𝐶 𝐶𝐶 Several lightweight cryptographic protocols are designed based Step 3. 𝐾𝐾 𝑅𝑅 ⋅ 𝑈𝑈 ⋅ 𝑃𝑃 −𝑃𝑃𝐾𝐾 on this method [37-40]. The proposed security mechanism will 𝐶𝐶 || || where 𝐶𝐶 is →𝑆𝑆 𝑆𝑆𝑆𝑆the𝑔𝑔 digital signature for string use this method to realize lightweightness. under , is the -coordinate of = ( ) 𝑆𝑆𝑆𝑆𝑔𝑔𝐶𝐶 𝑈𝑈𝑆𝑆 𝑈𝑈𝐶𝐶 𝐾𝐾𝑥𝑥 Step 4.𝐶𝐶 If 𝑥𝑥the verification of digital signatures𝐶𝐶 𝑆𝑆 and message𝑆𝑆 III. PROPOSED METHOD 𝑆𝑆𝐾𝐾 𝐾𝐾 𝑥𝑥 𝐾𝐾 𝑅𝑅 ⋅ 𝑈𝑈 ⋅ 𝑃𝑃 −𝑃𝑃𝐾𝐾 authentication codes succeed, then both sides derive the shared We propose a novel security mechanism for the blockchain- key from using some key derivation function. powered intelligent edge. The following three group of algorithms/methods will be presented for the security Applying𝐾𝐾 leakage-resilient algorithm: mechanism: Initially, splits the long-term private key into two random states and through the following steps: 𝐶𝐶 𝑆𝑆𝐾𝐾𝐶𝐶 𝑆𝑆𝐾𝐾𝐶𝐶𝐶𝐶 𝑆𝑆𝐾𝐾𝐶𝐶𝐶𝐶

42 2021 International Conference on Digital Contents: AICo(AI, IoT and Contents) Technology (DigiCon-21)

Step 1. Generate a random value and set = suite of security protocols and algorithms, prove their security under some well-defined models, and study their performance = 0 𝐶𝐶𝐶𝐶 0 Step 2. Set 𝑟𝑟 𝑆𝑆𝐾𝐾 𝑟𝑟 through experiments. Step 3. Destroy𝑆𝑆𝐾𝐾𝐶𝐶𝐶𝐶 𝑆𝑆𝐾𝐾 and𝐶𝐶 − store 𝑟𝑟0 and secretly in two separate places. 𝑆𝑆𝐾𝐾𝐶𝐶 𝑆𝑆𝐾𝐾𝐶𝐶𝐶𝐶 𝑆𝑆𝐾𝐾𝐶𝐶𝐶𝐶 Similarly, splits the long-term private key into two ACKNOWLEDGMENT random states and . 𝑆𝑆 𝑆𝑆𝐾𝐾𝑆𝑆 Jie Zhang wishes to thank the National Natural Science 𝑆𝑆𝑆𝑆 𝑆𝑆𝑆𝑆 Foundation of China under Grant No. 62002296; the Natural Then in the𝑆𝑆𝐾𝐾 key agreement𝑆𝑆𝐾𝐾 process, ( ) uses and ( and ) for computations, and refresh the storage Science Foundation of Jiangsu Province under Grant No. of and ( and ) as follows:𝐶𝐶 𝑆𝑆 𝑆𝑆𝐾𝐾𝐶𝐶𝐶𝐶 BK20200250; the XJTLU Key Programme Special Fund under 𝑆𝑆𝐾𝐾𝐶𝐶𝐶𝐶 𝐾𝐾𝑆𝑆𝑆𝑆 𝑆𝑆𝐾𝐾𝑆𝑆𝑆𝑆 Grant No. KSF-E-54. Ka Lok Man wishes to thank the AI 𝑆𝑆𝐾𝐾St𝐶𝐶𝐶𝐶ep 1. Generate𝑆𝑆𝐾𝐾𝐶𝐶𝐶𝐶 𝐾𝐾 a𝑆𝑆𝑆𝑆 random𝑆𝑆𝐾𝐾 𝑆𝑆value𝑆𝑆 University Research Centre (AI-URC), Xi’an Jiaotong- Step 2. Refresh + Liverpool University, Suzhou, China, for supporting his related 𝑟𝑟 research contributions to this article through the XJTLU Key Step 3. Refresh 𝑆𝑆𝐾𝐾𝐶𝐶𝐶𝐶 ← 𝑆𝑆𝐾𝐾𝐶𝐶𝐶𝐶 𝑟𝑟 Programme Special Fund (KSF-E-65) and Suzhou-Leuven IoT & AI Cluster Fund. Applying post-quantum𝑆𝑆𝐾𝐾𝐶𝐶𝐶𝐶 ← algorithm: 𝑆𝑆𝐾𝐾𝐶𝐶𝐶𝐶 − 𝑟𝑟 In step 2 and 3 of the original AKA protocol, use isogeny- based digital signature algorithms (i.e., SeaSign [17]) to REFERENCES compute and . [1] H. Liu, Y. Zhang, T. Yang, Blockchain-Enabled Security in Electric Applying𝐶𝐶 lightweight𝑆𝑆 method: Vehicles Cloud and Edge Computing. IEEE Network, 32(3), pp.78-83. 𝑆𝑆𝑆𝑆𝑔𝑔 𝑆𝑆𝑆𝑆𝑔𝑔 2018. Assume is more powerful than . The method transfers [2] X. Xu, X. Zhang, H. Gao, Y. Xue, L. Qi, W. Dou, BeCome: Blockchain- some time-consuming computations from to as follows: Enabled Computation Offloading for IoT in Mobile Edge Computing. 𝑆𝑆 𝐶𝐶 IEEE Transactions on Industrial Informatics, 16(6), pp.418741-95. 2020. In step 2 of the original AKA protocol, computes = 𝐶𝐶 𝑆𝑆 [3] K. Gai, Y. Wu, L. Zhu, L. Xu, Y. Zhang, Permissioned Blockchain and and sends instead of to . 𝑆𝑆 Edge Computing Empowered Privacy-Preserving Smart Grid Networks. 𝑆𝑆 𝑇𝑇 IEEE Internet of Things Journal, 6(5), pp.7992-8004. 2019. 𝑆𝑆 In step 3, is computed𝑆𝑆 via 𝑆𝑆 ( ) 𝑈𝑈 ⋅ 𝑃𝑃 𝑇𝑇 𝑈𝑈 𝐶𝐶 [4] J. Pan, J. Wang, A. Hester, I. Alqerm, Y. Liu and Y. Zhao, EdgeChain: 𝐶𝐶 𝑆𝑆 𝑆𝑆 An Edge-IoT Framework and Prototype Based on Blockchain and Smart 𝐾𝐾 𝑅𝑅 ⋅ 𝑇𝑇 − 𝑃𝑃𝐾𝐾 Contracts. IEEE Internet of Things Journal, 6(3), pp.4719-4732. 2019. [5] S. Tuli, R. Mahmud, S. Tuli, R. Buyya, Fogbus: A Blockchain-Based IV. EVALUATION Lightweight Framework for Edge and Fog Computing. Journal of Take the above AKA protocol as example, we evaluate its Systems and Software, 154, pp.22-36. 2019. performance in terms computation burdens on both sides. As [6] S. Guo, X. Hu, S. Guo, X. Qiu, F. Qi, Blockchain Meets Edge Computing: A Distributed and Trusted Authentication System. IEEE Transactions on ECC scalar multiplication is the most time-consuming operation Industrial Informatics, 16(3), pp.1972-83. 2019. in the protocol, we compare the number of scalar multiplications [7] P. C. Kocher, Timing Attacks on Implementations of Diffie-Hellman, on both sides of the original and proposed protocols in Table 1. RSA, DSS, and Other Systems, In Advances in Cryptology-CRYPTO’96, pp. 104–113, 1996. [8] P. C. Kocher, J. Jaffe, and B. Jun, Differential Power Analysis, In TABLE I. NUMBER OF SCALAR MULTIPLICATIONS ON AND Advances in Cryptology-CRYPTO’99, pp. 388–397, 1999. Original Protocol The Proposed Protocol 𝐶𝐶 𝑆𝑆 [9] D. Agrawal, B. Archambeault, J. R. Rao, and P. Rohatgi, The EM Side- channel(s), In Cryptographic Hardware and Embedded Systems-CHES 2 1 2002, pp. 29–45, 2003. [10] S. Tizpaz-Niari, P. Cerny, and A. Trivedi, Data-driven Debugging for 𝐶𝐶 2 3 Functional Side Channels, In Network and Distributed Systems Security 𝑆𝑆 (NDSS) Symposium 2020, Feb. 2020. [11] J. Berger, A. Klein, and B. Pinkas, Flaw Label: Exploiting IPv6 Flow The proposed protocol transfers one scalar multiplication Label, In IEEE Symposium on Security and Privacy (SP), pp. 1259-1276, from the limited to the more powerful , therefore, it is more 2020. friendly to limited devices than the original protocol. Besides, [12] S. Wang, Y. Bao, X. Liu, P. Wang, D. Zhang, and D. Wu, Identifying the proposed protocol𝐶𝐶 is supposed to have𝑆𝑆 better performance Cache-based Side Channels through Secret-augmented Abstract Interpre- than the original one as more computations are undertaken by tation, In 28th USENIX Security Symposium (USENIX Security 19), pp. 657-674, 2019. the powerful side. [13] S. Bhasin, A. Chattopadhyay, A. Heuser, D. Jap, S. Picek, and R. R. Shrivastwa, Mind the Portability: A Warriors Guide through Realistic V. CONCLUSION Profiled Side-channel Analysis, In Network and Distributed Systems Security (NDSS) Symposium 2020, Feb. 2020. The proposed security mechanism can address the security [14] B. Hettwer, S. Gehrer, and T. Gu neysu,̈ Applications of Machine problem against two emerging attacks; meanwhile, it is friendly Learning Techniques in Side-channel Attacks: a survey, Journal of for resource-constraint devices in blockchain-powered Cryptographic Engineering, 10(2), pp. 135-62, 2020. intelligent edge. In our future work, we will design and realize a

43 2021 International Conference on Digital Contents: AICo(AI, IoT and Contents) Technology (DigiCon-21)

[15] L. Masure, C. Dumas, and E. Prouff, A Comprehensive Study of Deep [28] Giorgi Maisuradze and Christian Rossow. ret2spec: Speculative Learning for Side-channel Analysis, IACR Transactions on execution using return stack buffers. In Proceedings of the 2018 ACM Cryptographic Hardware and Embedded Systems, pp. 348-375, 2020. SIGSAC Conference on Computer and Communications Security, 2109- [16] NIST, Status Report on the Second Round of the NIST Post-Quantum 2122, 2018. Cryptography Standardization Process, available online: [29] Pietro Frigo, Cristiano Giuffrida, Herbert Bos and Kaveh Razavi. Grand https://csrc.nist.gov/publications/detail/nistir/8309/final, July 2020. Pwning unit: accelerating microarchitectural attacks with the GPU. In [17] L. De Feo and S. D. Galbraith, SeaSign: Compact isogeny signatures from 2018 IEEE Symposium on Security and Privacy, 357-372, 2018. class group actions, In Proc. Annu. Int. Conf. Theory Appl. Cryptographic [30] SeongHan Shin, Kazukuni Kobara and Hideki Imai. Leakage-resilient Techn., pp. 759–789, 2019. authenticated key establishment protocols. In International Conference on [18] Paul Kocher. Time attacks on implementations of DH, RSA, DSS, and the Theory and Application of Cryptology and Information Security, 155- other systems. InAnnual International Cryptology Conference, 104-113, 172, 2003. 1996. [31] Ou Ruan, Jing Chen and Mingwu Zhang. Provably leakage-resilient [19] Paul Kocher, Joshua Jaffe and Benjamin Jun. Differential power analysis. password-based authenticated key exchange in the standard model. IEEE In Annual International Cryptology Conference, 388-397, 1999. Access, 5(2017), 26832-26841, 2017 [20] Jean-Jacques Quisquater and David Samyde. Electromagnetic analysis [32] Qu Ruan, Yuanyuan Zhang, Mingwu Zhang, Jing Zhou and Lein Harn. (ema): Measures and countermeasures for smart cards. In Smart Card After-the-fact leakage-resilient identity-based authenticated key Programming and Security, 200-210, 2001. exchange. IEEE Systems Journal, 12(2), 2017-2026, 2018. [21] Dag Arne Osvik, Adi Shamir and Eran Tromer. Cache attacks and [33] Duncan S. Wong and Agnes H. Chan. Efficient and mutually countermeasures: the case of AES. In Cryptographers’ Track at the RSA authenticated key exchange for low power computing devices. In Conference, 1-20, 2006. International Conference on the Theory and Application of Cryptology and Information Security, 272–289, 2001. [22] Eli Biham and Adi Shamir. Differential fault analysis of secret key cryptosystems. In Annual International Cryptology Conference, 513-525, [34] Brian LaMacchia, Kristin Lauter and Anton Mityagin. Stronger security 1997. of authenticated key exchange. In Proceedings of the 1st International Conference on Provable Security, 1–16, 2007. [23] Youngjoo Shin, Hyung Chan Kim, Dokeun Kwon, Ji Hoon Jeong and Junbeom Hur. Unveiling Hardware-based Data Prefetcher, a Hidden [35] Liang Ni, Gongliang Chen, Jianhua Li and Yanyan Hao. Strongly secure Source of Information Leakage. In Proceedings of the 2018 ACM identity based authenticated key agreement protocols. Computers and SIGSAC Conference on Computer and Communications Security, 131- Electrical Engineering, 37(2), 205–217, 2011. 145, 2018. [36] k Hafizul Islam. A provably secure ID-based mutual authentication and [24] A. Islam Mohammad and Shaolei Ren. Ohm's law in data centers: A key agreement scheme for mobile multi-server environment without ESL voltage side channel for timing power attacks. In Proceedings of the 2018 attack. Wireless Personal Communications, 79(3), 1975-1991, 2014. ACM SIGSAC Conference on Computer and Communications Security, [37] Jie Zhang, Nian Xue and Xin Huang. a secure system for pervasive social 146-162, 2018. network-based healthcare. IEEE Access, 4(2016), 9239 – 9250, 2016. [25] Giovanni Camurati, Sebastian Poeplau, Marius Muench, Tom Hayes and [38] Jie Zhang, Xin Huang, Paul Craig, Alan Marshall and Dawei Liu. An Aurélien Francillon. Screaming channels: When electromagnetic side improved protocol for the password authenticated association of IEEE channels meet radio transceivers. In Proceedings of the 2018 ACM 802.15. 6 standard that alleviates computational burden on the node. SIGSAC Conference on Computer and Communications Security, 163- Symmetry, 8(11), 131, 2016. 177, 2018. [39] Jiaren Cai, Xin Huang, Jie Zhang, Jiawei Zhao, Yaxi Lei, Dawei Liu and [26] Logic Jo Van Bulck, Frank Piessens and Raoul Strackx. Nemesis: Xiaofeng Ma. A handshake protocol with unbalanced cost for wireless Studying microarchitectural timing leaks in rudimentary CPU interrupt. updating. IEEE Access, 6(2018), 18570-18581, 2018. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and [40] Jie Zhang. Authenticated key exchange protocols with unbalanced Communications Security, 178-195, 2018. computational requirements. Doctoral dissertation, University of [27] Hoda Naghibijouybari, Ajaya Neupane, Zhiyun Qian and Nael Abu- Liverpool, 2018. Ghazaleh. Rendered insecure: GPU side channel attacks are practical. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, 2139-2153, 2018.

44 2021 International Conference on Digital Contents: AICo(AI, IoT and Contents) Technology (DigiCon-21) Smart contract-based operation technique for secure accessing distributed contents

Yunhee Kang Young B. Park Div. of Computer Engineering Department of Software Science Basekseok University Dankook University Cheonan, Korea Yongin, Korea [email protected] [email protected]

Abstract— In a cloud environment, a centralized content encrypted using a symmetric key based encryption technique, management naturally embeds security risks including personal and the encrypted content is stored in the IPFS storage. In this information exposure [4]. In this paper, to solve the security risk, process, access information is managed using a smart the original digital content is divided and encrypted, and then the contract[6]. The distributed encrypted content is handled by a system is designed for distributing it securely in a decentralized P2P storage system and then constructing a prototype. To smart contract that is defined for delivering a secure digital achieve the goal, a digital content is encrypted using a symmetric content. The smart contract is configured to be operated with key based encryption technique, and the encrypted content is the Ethereum blockchain, and a DApp(Decentralized stored in the IPFS storage. In this process, access information is Application) as a Ethereum client. The DApp has its backend managed by smart contracts. To access the distributed encrypted code running on a decentralized peer-to-peer network without content, a smart contract isdefined for delivering a secure digital a central server. content. II. RELATED ORKS Keywords— cloud environment; security risks; P2P storage W system; encryption; smart contract A. Interplanertary protocol file system (IPFS) IPFS stands for Interplanetary File System. It is a P2P I. INTRODUCTION based decentralized storage system for maintaining large- A. Background and motivation capacity files or data [7]. It is created by Juan Benet, who later founded Protocol Labs in May 2014. IPFS has no single point Over the past several years, blockchain technology has of failure and nodes do not need to trust each other, distributed been driving a breakthrough in many areas including financial content delivery can also save bandwidth consumption[8]. and non-financial applications requiring traceability, Any user in the network can serve a file by its content address, transparency, trust and accountability [1,2]. Blockchain offers and other peers in the network can find and request that the ability to create unique digital assets. This innovation content from any node who has it using a distributed hash from the promise of Bitcoin [1], which is a P2P electronic table cash operated to solve the double spending problem without any intervention of a trusted intermediary. The blockchain IPFS aims to create a single global network. By storing data structure is defined as an ordered back-linked record of the data in an IPFS network file system, the IPFS hash for that blocks of transactions. It can be in a database or saved as a file. data is obtained. It can store and share data based on a A blockchain platform, Ethereum, extended the capabilities of distributed file system. When saving the file, it returns the the Bitcoin blockchain by adding a new feature, smart contract CID (Content Identifier). However, IPFS did not provide the [3,4]. function of image integrity verification. To support confidentiality and integrity of the digital content stored in In a cloud environment, a centralized content management IPFS, we apply the encryption scheme and blockchain. In naturally embeds security risks including personal information this work, encrypted files are stored in IPFS. exposure [5]. Sometimes data hold by third-party servers may be stolen, leaked by diverse attacks because security and B. Smart contract reliability of traditional centralized cloud system depend on the cloud-service provider. In this paper, to solve the security A smart contract is a program that directly controls the risk, we apply the blockchain technology to give a way for exchanges or redistributions of digital assets between two or dealing with the original digital content divided and encrypted. more members according to certain rules or conditions confirmed between involved members. It is automated B. Contribution programs designed for handling a business contract processing on the Ethereum platform. This paper proposes a system that is designed for distributing a digital content securely in a decentralized P2P Szabo defines smart contracts as computerized transaction storage system and then constructing a prototype running on protocols that execute terms of a contract[4]. It is used to the Ethereum node. To achieve the goal, the digital content is permit trusted transactions and agreements to be carried out

45 2021 International Conference on Digital Contents: AICo(AI, IoT and Contents) Technology (DigiCon-21) among disparate, anonymous parties without the need for a cipher block is given as input to the next encryption algorithm central authority, legal system, or external enforcement after XOR with original plaintext block. Figure 2 shows the mechanism. CBC encryption process. In this paper, it is a program that is written in Solidity, compiled, and executed on the Ethereum Virtual Machine (EVM). The EVM behaves as a mathematical function would: Given an input, it produces a deterministic output. It is used to formally describe Ethereum as having a state transition function. Ethereum smart contracts use cryptocurrency called Ether (ETH) to pay fees and execute program functions. In this paper, a smart contract is used to store the CID value stored in IPFS in the blockchain.

III. SYSTEM DESIGN

A. Overall structure of the designed system Fig. 2. The CBC encryption process The designed system of the distributed content access An initialization vector (IV) is an input to a cryptographic management uses a blockchain to ensure confidentiality, primitive of CBC being used to provide the initial state in transparency and reliability. Figure 1 shows the overall Figure 1. The CBC based encryption scheme is described by architecture of the proposed system that is applied in the the key-generation algorithm (Gen), having a key-space (K), content written in CDM. As shown in Figure 1, the CDM is the encryption (Enc), and decryption (Dec) algorithms, with encrypted and delivered to IPFS. Each of the cipher CDMs the data-space (D). represents the CDM encrypted in IPFS. The access  information is handled by CDM encryption and CDM Gen generates a random key (32 byte) and an IV (16 decryption in the trust managers between two hospitals, CDM byte); provider and CDM consumer in via the distributed leger  Enc/Dec is based on Advanced Encryption Standard running on the blockchain. Using a encryption scheme of (AES) and Cipher Block Chaining with IV; distributed content, data production and consumption can be performed without third party intervention. It consists of an  D: digital content encryption/decryption module and a key management module including a function of key generation. A trust manger IV. IMPLEMENTATION operates those modules. Here we only focused on CDM encryption and CDM decryption. A. Experimental setup In this experiment, we built a prototype to evaluate the designed system, which runs on Linux operating system (Cent OS 7.x, IPFS 0.5, Truffle 5.3.6 and MetaMask plugin). Truffle is a testing framework and asset pipeline for blockchains using the EVM. To handle a secure delivery of a content as a digital asset, a smart contract written in Solidity is deployed on the Ethereum blockchain for storing and accessing keys for encryption. To build a DApp for interacting with an Ethereum network, we use a React framework in NodeJS and uses web3 library in JavaScript. Web3 is used to handle JSON-RPC implementations that is a stateless, light- weight remote procedure call (RPC) protocol for a Ethereum blockchain node. The encryption module is handled as REST API in Flask framework.

Fig. 1. The overall architecture of the proposed system B. Experimental results Figure 3 shows the process of encryption and storing a B. Encryption Scheme digital content in the designed system. As shown in Figure 3, it is uploaded into IPFS after encrypting the content via a web A block cipher is suitable for the secure cryptographic based server using Flask. A CID with the location information transformation via encryption or decryption of one fixed- returned value from IPFS is stored into a Ethereum blockchain length group of bits called a block. The encryption and running on Truffle. A encryption key used for encrypting the decryption of a block cipher is operated based on a symmetric content is also store on the Ethereum blockchain by the smart key that enables performance improvement by applying contract with a setting function. Cipher block chaining (CBC). In CBC mode, the previous

46 Fig. 3. The process of encryption and storing a digital content

A digital content is divided and encrypted by CBC method and stored in IPFS storage. The key information for Fig. 6. Example of the decrypted content encryption and CID, which is the location information of the IPFS storage, are stored in the blockchain using a smart contract. In the encryption process, when a user uploads an V. CONCLUSION image to the server through a client, the server goes through In this paper the operation scheme for accessing the the image encryption process. When the encrypted file and encrypted segmentation information after distributing and signature are stored in IPFS, the two returned CIDs are sent to maintaining the encrypted segmentation information of data the client. Finally, the user calls the smart contract and stores using a block chain is described. The DApp, blockchain them in the Ethereum blockchain as shown in Figure 4. system application, has the advantage of being able to access encrypted and distributed data without third party intervention for key management of distributed content and content location management. In the future the developed smart contract for key management is used for automatic verification of the integrity of cloud CDM medical information data, Fig. 4. The key and signature that is stored IPFS which deals with privacy information including personal information, and will be used as a method to ensure the Figure 5 shows the process of decryption and accessing a reliability of digital data management. digital content in the designed system. To access the encrypted contents located in IPFS, the CID and the key are responded by the smart contract with a getting function. ACKNOWLEDGEMENT This research was supported by the MSIT(Ministry of Science and ICT), Korea, under the ITRC(Information Technology Research Center) support program( IITP-2021- 2017-0-01628 ) supervised by the IITP(Institute for Information & communications Technology Promotion). This work was supported by Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government(MSIT) (2021-0-00177, High Assurance of Smart Contract for Secure Software Development Life Cycle).

REFERENCES Fig. 5. The process of decryption and accessing a digital content [1] Nakamoto, S. Bitcoin: A Peer-to-Peer Electronic Cash System, May 2009. Available online: http://www.bitcoin.org/bitcoin. pdf (accessed on As shown in Figure 5, the decryption process transmits the 23 April 2021). CID to the server, and the server requests data. A DApp for [2] Al Omar, A.; Rahman, M.S.; Basu, A.; Kiyomoto, S. Medibchain: A dividing, encrypting and decrypting digital contents, and blockchain based privacy preserving platform for healthcare data. In managing keys is used to call smart contracts by JSON-RPC. Proceedings of the International Conference on Security, Privacy and Anonymity in Computation, Communication and Storage, Guangzhou, To access the encryption file on IPFS, the key information in China, 12–15 December 2017; Springer: Berlin/Heidelberg, Germany, DApp is access by calling the smart contract. Figure 6 shows 2017; pp. 534–543. the result of the contented decrypted.

47 [3] K. Christidis, M. Devetsikiotis, Blockchains and smart contracts for the Internet of Things, IEEE Access 4 (2016) 2292–2303. [4] Szabo, N. Smart Contracts: Formalizing and Securing Relationships on Public Networks. First Monday, Volume 2, No. 9. 1997. Available online: http://firstmonday.org/article/view/548/46 [5] Zhang, R.; Xue, R.; Liu, L. Security and Privacy on Blockchain. ACM Comput. Surv. 2019, 52, 1–34, doi:10.1145/3316481. [6] Nizamuddin, N.; Hasan, H.; Salah, K.; Iqbal, R. Blockchain-Based Framework for Protecting Author Royalty of Digital Assets. Arab. J. Sci. Eng. 2019, 44, 3849–3866 [7] Khatal, S.; Rane, J.; Patel, D.; Patel, P.; Busnel, Y. FileShare: A Blockchain and IPFS Framework for Secure File Sharing and Data Provenance. In Proceedings of the Computing Algorithms with Applications in Engineering; Springer: Cham, Switzerland, 2020; pp. 825–833. [8] Benet, J. Ipfs-content addressed, versioned, p2p file system. arXiv 2014, arXiv:1407.3561.

48 2021 International Conference on Digital Contents: AICo(AI, IoT and Contents) Technology (DigiCon-21) Study on the Future Prediction Service Based on Ultra-Low Power Communication Application using the Edge Platform

Yeonggwang Kim1 Jinsul Kim* Dept. of ICT Convergence System Engineering Dept. of ICT Convergence System Engineering Chonnam National University, 77, Yongbong-ro, Buk-gu Chonnam National University, 77, Yongbong-ro, Buk-gu Gwangju, Republic of Korea Gwangju, Republic of Korea [email protected] [email protected] \ Abstract—Along with A.I. technology, IoT technology has temperature and air pressure, and an intelligent cloud web grown rapidly as a AIoT(AI of Things) technology. In addition, server system. The MQTT broker used NVIDIA's AI board and recognizing data values through sensors and performing deep was tasked with measuring sensors. The collected datasets are learning tasks have become a reality. In this paper, we introduce stored in MongoDB Cloud Database. The datasets were built the utilization of low-power communication protocols to reduce separately by day classifying the highest temperature, lowest data loss using IoT sensors and techniques to store data values in temperature, highest humidity, lowest humidity, highest the form of Cloud Database through web servers. The collected barometric, and lowest barometric. Intelligent cloud web server data is learned over the Artificial Neural Network, and we systems access MongoDB on the NodeJS stack to obtain values. introduce an intelligent weather prediction monitoring system It also provides a monitoring system that shows the value of that predicts weather through this model. the database on the web. Keywords—A.I; AIoT; ANN; Cloud Database; Weather Prediction, Monitoring System B. Datasets Materials Datasets for predicting weather are accumulated via Nvidia I. INTRODUCTION IoT devices and added to the dataset as shown in Fig. 1. We have witnessed the development and deployment of the evolving wireless communication in the past decade [1–3]. With the high-speed data transmission capability enabled by the wireless communication network, people have envisioned the Internet-of-Things (IoT) system towards smart city where numerous interrelated IoT devices are linked together in a huge and complicated network [4–7]. The trend of artificial intelligence (AI) builds up the other cornerstone for smart city. The AI helps with big data processing, assisting humans to analyze complicated systems and make decisions [8–11]. The synergy between AI and IoT has given rise to the artificial intelligence of things (AIoT), where IoT serves like the nervous system to deliver information to the AI Network for decision making. Compared with the IoT system which can collect, store, and process weather data, analysis and control functions are added in the Ultra-Low Power AIoT system. In this paper, numerous physical sensors are deployed to collect various surrounding information at different locations, temperature sensors, humidity sensors, light sensors at each level of each building. The collected data is trained using an Artificial Neural Network, and we present an intelligent weather prediction monitoring system based on this model. Fig. 1. Temperature, Humidity, Barometric datasets collected through IoT II. METHODS AND MATERIALS devices

A. Research Methods and Environment Erroneous missing values and values for previous data were added through the Korea Meteorological Administration This research mainly consists of a low-power dataset [12]. So, we studied with a total of 2000 datasets. communication protocol, a MQTT (Message Queuing Telemetry Transport) broker, a dataset collection sensor for

49 2021 International Conference on Digital Contents: AICo(AI, IoT and Contents) Technology (DigiCon-21)

III. EXPERIMENT AND RESULTS

A. Experimental Algorithms The datasets collected through the MQTT Broker are sent to the Cloud Database, where the learning is done with the existing datasets. Learning algorithms performed learning using the Artificial Neural Network as shown in Fig. 2. and augmenting the density layer.

Fig. 3. Artificial Neural Network algorithms summary used in the experiment

We also found that the loss value gradually decreased with the number of learning (epoch), resulting in a lower loss value as the learning progresses as shown in Figure 4.

Fig. 4. Artificial Neural Network algorithms summary used in the experiment

The learning time took about 10 seconds per epoch, the maximum accuracy was 84.92%, and the loss value was 0.3648 based on validation.

IV. CONCLUSION The simple structured Artificial Neural Network algorithm has good accuracy and is suitable for weather forecasting. However, the accuracy did not exceed 85%, and it was not suitable for more accurate predictions. We would like to apply different CNN-based algorithms to conduct research that can benefit from the accuracy aspect. CNN-based algorithms, which have complex structures, will take a long time to learn Fig. 2. Artificial Neural Network algorithms summary used in the so that they can proceed with learning on the side of the experiment artificial intelligence web server. Finally, in order to use a low power based communication system in the practical phase, we will conduct research on practical problems and outliers and B. The result of an experiment develop them into the commercialization phase. As the learning progresses as shown in Figure 3, we find that the accuracy (%) increases gradually with the number of ACKNOWLEDGMENT learning (epoch), resulting in higher accuracy. This research was supported by the MSIT (Ministry of Science and ICT), Korea, under the ITRC (Information Technology Research Center) support program (IITP-2020-

50 2021 International Conference on Digital Contents: AICo(AI, IoT and Contents) Technology (DigiCon-21)

2016-0-00314) supervised by the IITP(Institute for Information [5] D. Mocrii, Y. Chen, P. Musilek, IoT-based smart homes: a review of & communications Technology Planning & Evaluation) system architecture, software, Commun. Priv. Secur. Internet Things 1–2 Furthermore, this research was supported by Electronics and (2018) 81–98. [6] R. Molina-masegosa, J. Gozalvez, LTE-V for sidelink 5G V2X Telecommunications Research Institute(ETRI) grant funded by vehicular communications: a new 5G technology for short-range ICT R&D program of MSIT/IITP[2019-0-00260, Hyper- vehicle-to-everything communications, IEEE Veh. Technol. Mag. 12 Connected Common Networking Service Research (2017) 30–39. Infrastructure Testbed]. [7] S. Talari, M. Shafie-Khah, P. Siano, V. Loia, A. Tommasetti, J.P.S. Catal ao, A review of smart cities based on the internet of things concept, Energies 10 (2017) 421. [8] Q. Shi, Z. Zhang, T. He, Z. Sun, B. Wang, Y. Feng, X. Shan, B. Salam, REFERENCES C.Lee, Deep learning enabled smart mats as a scalable floor monitoring system, Nat. Commun. 11 (2020) 4609. [1] J. Thompson, X. Ge, H.C. Wu, R. Irmer, H. Jiang, G. Fettweis, S. [9] T. Teich, F. Roessler, D. Kretz, S. Franke, Design of a prototype neural Alamouti, 5G wireless communication systems: prospects and network for smart homes and energy efficiency, in: Procedia challenges [Guest Editorial], IEEE Commun. Mag. 52 (2014) 62–64. Engineering, Elsevier B.V., 2014, pp. 603–608. [2] W. Roh, J.Y. Seol, J.H. Park, B. Lee, J. Lee, Y. Kim, J. Cho, K. Cheun, [10] H.D. Mehr, H. Polat, A. Cetin, Resident activity recognition in smart F. Aryanfar, Millimeter-wave beamforming as an enabling technology homes by using artificial neural networks, in: Proceedings of the 4th for 5G cellular communications: theoretical feasibility and prototype IEEE International Istanbul Smart Grid and Cities Congress and Fair, results, IEEE Commun. Mag. 52 (2014) 106–113. ICSG 2016, pp. 16–20. [3] L. Pierucci, The quality of experience perspective toward 5G technology, [11] B. Dong, Q. Shi, Y. Yang, F. Wen, Z. Zhang, and C. Lee, ‘‘Technology IEEE Wirel. Commun. 22 (2015) 10–16. evolution from self-powered sensors to AIoT enabled smart homes,’’ Nano Energy, vol. 79, Jan. 2021, Art. no. 105414. [4] B.L. Risteska Stojkoska, K.V. Trivodaliev, A review of internet of things for smart home: challenges and solutions, J. Clean. Prod. 140 [12] Korea Meteorological Administration dataset (2017) 1454–1464. (https://www.kma.go.kr/eng/)

51 2021 International Conference on Digital Contents: AICo(AI, IoT and Contents) Technology (DigiCon-21) Application of TadGAN for detecting Electric Power Usage Anomaly

Sangwon Oh1 Jinsul Kim* Dept. of ICT Convergence System Engineering Dept. of ICT Convergence System Engineering Chonnam National University, 77, Yongbong-ro, Buk-gu Chonnam National University, 77, Yongbong-ro, Buk-gu Gwangju, Republic of Korea Gwangju, Republic of Korea [email protected] [email protected]

Abstract— In modern society, energy use has increased algorithm or near-neighbor (NN) algorithm[2]. Recently, new exponentially compared to the past, regardless of household, architectures applying GAN(Generative Adversarial Network) commercial or business use. As a result, suppliers who provide architecture have been studied for use in time series data[3]. power need to manage how users consume electric power. Most power data are time series data, and the emerging problem is In this paper, we detect anomailes using TadGAN model anomaly detection. Therefore, it is necessary to detect anomalies among networks utilizing GAN. Among the existing GAN- in users' electirc power usage data in order to efficiently manage based models, we adopt TadGAN as that have a good factor to power. In this paper, we use the TadGAN model to detect outliers reflect the characteristics of time series data. The remaining in power consumption. To evaluate the performance, we specify part of this paper is organized as follows. Section II introduces thresholds using the ARIMA model and set ground-truth, which the related works. Section III present our dataset to learning considers values beyond them as outliers. We compared the and proposed TadGAN model and that’s hyperparameter number of TadGAN hyperparametric settings in nine cases with setting. Finally, Section IV summarizes the whole paper and ground-truth, respectively, to yield F1-Score. proposes possible future work. Keywords—A.I.; GAN; Anomaly Detect; Time Series Data; TadGAN; Wasserstein Loss; ARIMA; II. RELATED WORK Anamoly detection technology is increasing throughout I. INTRODUCTION various industries. Recently, technologies have been developing to observe something or collect values generated Most people living in modern civilization are provided with through sensors over time. As a result, many technologies have electricity produced by the state or private enterprise. The been studied to analyze time series data recorded. In order to electric power supply consists of a few(government or private detect anomalies, the first thing to do is to define how to define business) generating large amounts of electricity and providing themselves. First of all, anomalies can be classified into two it to a large number of people(customers using power), making categories in time series data. it important to manage the power received and used efficiently. In particular, from the perspective of providing electricity, it is • Point Anamoly means when individual reaches an necessary to understand how customers use the electric power abnormal value. Generally, thresholds are specified, and they receive to make plans to reduce various costs in terms of out of bounds are considered anomalies. production and supply. Therefore, analyzing data on how customers use electric power has become an effective task for • Collective Anomaly refers to an outlier in which an most electric power providers. entire sequence of data is considered abnormal, even if individual data are normal. The entire sequence of In general, data representing the electric power used by changes in the normal cycle or deviations from the customers are stored and recorded in time series form. This is data’s trend is called collective anomalies. because the detector recodes electric power usage according to time unit. These data obtained through this process have These kinds of anomalies are distributed in time series data. various feactures, such as having cycles or showing trends over To detect this, various methods have been studied previously. time. However, the presence of anomalies in the data hinders It is divided into two parts according to the approach. extracting the feactures of the data. We can solve this by detecting anomalies in advance within time series data. A. Anomaly Detection Approaches • Methods for detecting anomailes within time series data Statistcal techniques can be used to define and detect have been studied throughout various algorithms. There are anomalies. There is a parametric technique that largely statistical-based methods and machine learning-based hypothesizes that the data are genereated from an methods. Statistical-based methods have been studied using estimated distribution. Typically, there are regression regular-scale model or regression model[1]. And machine model-based methods such as ARIMA[4] and Robust, learning-based methods have been studied using clusters and hypothesis tests based on normal models to

52 2021 International Conference on Digital Contents: AICo(AI, IoT and Contents) Technology (DigiCon-21)

determine anomalies. If the data do not follow any series data to GANs and solving the problems of GANs. There particular model, we use nonparametric techniques. are three differences between TadGAN and GAN. Typically, there is a way to define normal values based • on histograms, or to detect anomalies based on kernel The first is that the discriminative model and the functions. generative model work twice. In TadGAN, a new latent domain is introduced, generating and discriminating • Also machine learning can be used to define and detect from the input domain to the latent domain, and anomalies. Typically there is a method to define reconstructing from the latent domain to the input anomalies based on difference between prediction value domain again. Through this, we compare the original and real value by learning the prediction model. and reconstructed data to detect anomalies in the Prediction models used mainly in this method include existing data. SVM[5], RNN[6], LSTM[7], etc. Differently, in • addition to using predictive models, another approach The second is the use of Washerstein Loss. In pure has recently been proposed. That is the use of GAN models, even if the generative model imbalances reconstruction. Reconstruction refers to the mapping of the data, it is not a problem at all. Because no matter input data into a low dimensional space(latent space) what value you make, you just have to deceive the and then restoring it to the dimension of the original discriminator. This can lead to mode collapse problem data. Mapping data to low dimensions leaves only while generative models are learning. To solve this features that can explain the data well. Then problem, the Wasserstain Loss function is applied. This reconstructing to the dimension of the original data reduces the probability of the gradient expansion eliminates anomalies because they were reconsturcted problem occurring and improves the stability and with the features of the original data. Therefore, reliability of model training. Thus, TadGAN applied anomalies can be determined by comparing Waserstain to the loss functions of the discriminant and reconstructed and original data. generative models. Among these methods of defining anomalies, this study • The third is the addition of a cycle consistency loss adopted the idea of reconstruction. However, reconstruction- function. Since TadGAN is a model designed for based anomaly detection methods have the limitations that anomaly detection in time series data, it should reflect models can easily become overfitting without proper the characteristics of time series data well. Therefore, in normalization fucntion. Adversarial learning has proven to addition to the existing two loss functions, TadGAN solve these problems. Among them, GAN-based algorithms add a new loss function, Cycle Consistency Loss. This were adopted to enable effecitve reconstruction. minimizes the difference between the original and reconstructed samples. B. GAN(Generative Adversarial Network) A GAN consists of a generative model and a discriminative model[8]. Generative models make fake images similar to real images, which are input data. Discriminative models distinguish generated fake images from real images. Two loss III. EXPERIMENT functions are used in GAN. In discriminative model, they In this study, we use the TadGAN model to detect should be trained in a way that better discriminates against real anomalies in electric power usage. Ground-truth was images, and in generative models, they should be trained in a established by setting thresholds using the using ARIMA way that better misjudges the generated fake images. For this model, and performance was evaluated by comparing purpose, the discriminative model scores on the data in GAN. hyperparameters within the TadGAN model while tuning them. If is is close to zero, it is judged to be a generated fake image, We used a dataset that recorded the electric power usage of a and if is close to 1, it is judged to be a real image. Eventually, building provided by the Korea Energy Agency from June 1, the discriminative model wants to maximize the score for real 2020 to August 24, 2020. We extracted and used only electric image, and the generative model wants to maximize the score power usage column among many columns in the dataset. This for fake image. Repeatedly, the generative model learns the data did not have a preprocessing process because missing features of the input image well, which can be exploited for values did not exist. The time unit for anomaly detection was various purposes. based on ond day, the minimum time unit with periodicity. Detailed raw dataset shown in below Fig. 1. GAN is specialized for training image data. However, we find it hard to use pure GAN-based models because we use We then applied the ARIMA model to the extracted time electric power data with time series characteristics. Therefore, series data to explore the optimal p, d, and q parameters. We in this paper, we used TadGAN, which reflects the time series found the search results to be most appropriate when (p, d, q) is properties well and is specialized in sequential data. (4, 1, 1). In addition, from the analyzed data, we consider the minimum unit of time (days) when the difference value is 10 or C. TadGAN(Time series Anomaly Detecion using GAN) higher as an outlier, and extract the outlier as shown in Figure 2 TadGAN is a model proposed by Alexander Geiger[9], below. overcoming the limitations of not being able to apply time

53 2021 International Conference on Digital Contents: AICo(AI, IoT and Contents) Technology (DigiCon-21)

mean anomaly detection using ARIMA, and red area mean using TadGAN.

TABLE I. F1-SCORE OF ANOMALY DETECT USING TADGAN

Conjuction Reconstruction Error Calculate Method with Critic Score Point Differnece Area Difference DTW

Only REC 0.3486 0.1607 0.3069

Sum 0.1822 0.2868 0. 2439

Multiply 0.3348 0.3209 0.3736

Fig. 1. Provided Electric Power Usage Raw Dataset Fig. 3. Result of Hyperparameter : area, Only Reconstruction Error

Fig. 4. Result of Hyperparameter : area, Multiply two indicators

Fig. 2. Anomaly Detect Result of using ARIMA in Power Usage Data

Then we learn anomalies using the TadGAN model and compared the leraning result with defined ground-truths by ARIMA model. At this time, we choose how to calculate the reconstruction error with the hyperparameter of TadGAN, and how to compute with Critical Score. The reconstruction error represents the difference between the real input value and the reconstructed value in input domain. Three optoins exist to Fig. 5. Result of Hyperparameter : area, Sum two indicators caculate this. The first is Point-wise Difference (point), which means the simple difference between the reconstructed and the actual values. The second is Area Differentiation (area), which is a measure of the similarity between local regulations. It is defined as the mean difference between the two values and the area under the curve. The third is Dynamic Time Warping (DTW). As a method of calculating the optimal match between two given time sequences, it is defined as the minimum distance between two curves of a sequential of values. Fig. 6. Result of Hyperparameter : point, Only Reconstruction Error Once one of the discriminative models proceeds to distinguish between input sequences and generated fake sequences, it can use directly as an anomaly measure for time series seqeunces. Using this, a critic score can be assigned for each time step. The hyperparameters of the TadGAN used in this study use methods for calculating reconstruction errors and the number of cases in conjuction with the critic scores. When only reconstruction errors are written, two indicators are summed, and two indicators are multiplied. The results are shown in the table 1. below and in Figures 3-11. Blue area Fig. 7. Result of Hyperparameter : point, Multiply two indicators

54 2021 International Conference on Digital Contents: AICo(AI, IoT and Contents) Technology (DigiCon-21)

detected by ARIMA models shows that trend, staationarity. and differencing are also worth considering. However, the limitations of this study have been revealed in many ways. First, it does not apply many kinds of data. The research was conducted simply with data on one building, so absolute results could not be obtained. Secondly, there are too few ways to Fig. 8. Result of Hyperparameter : point, Sum two indicators determine ground-truth. Using other statistical models or Neural Network-based methods, such as LSTM, could have provided a slightly more diverse perspective. However, prior research suggests that we have presented a new approach as we rarely use TadGAN industrially.

ACKNOWLEDGMENT This research was supported by the MSIT(Ministry of Science and ICT), Korea, under the ITRC(Information Technology Research Center) support program(IITP-2021- Fig. 9. Result of Hyperparameter : DTW, Only Reconstruction Error 2016-0-00314) supervised by the IITP(Institute for Information & communications Technology Planning & Evaluation, and the Korea Electric Power Reserarch Institute (KEPRI) grant funded by the Korea Electric Power Corporation (KEPCO) (No. R20IA02).

REFERENCES

Fig. 10. Result of Hyperparameter : DTW, Multiply two indicators [1] Kromanis, Rolands, and Prakash Kripakaran. "Support vector regression for anomaly detection from measurement histories." Advanced Engineering Informatics 27.4 (2013): 486-495.. [2] Ishimtsev, Vladislav, et al. "Conformal $ k $-NN Anomaly Detector for Univariate Data Streams." Conformal and Probabilistic Prediction and Applications. PMLR, 2017. [3] Jiang, Wenqian, et al. "A GAN-based anomaly detection approach for imbalanced industrial time series." IEEE Access 7 (2019): 143608- 143619. [4] Yaacob, Asrul H., et al. "Arima based network anomaly detection." 2010 Second International Conference on Communication Software and Fig. 11. Result of Hyperparameter : DTW, Sum two indicators Networks. IEEE, 2010. [5] Erfani, Sarah M., et al. "High-dimensional and large-scale anomaly detection using a linear one-class SVM with deep learning." Pattern IV. CONCLUSION Recognition 58 (2016): 121-134.ngs from the sixth annual IEEE SMC In this paper, we use the TadGAN model to detect information assurance workshop. IEEE, 2005. anomalies in electric power usage. To evaluate the [6] Nanduri, Anvardh, and Lance Sherry. "Anomaly detection in aircraft performance, we specify thresholds using the ARIMA model data using Recurrent Neural Networks (RNN)." 2016 Integrated Communications Navigation and Surveillance (ICNS). Ieee, 2016. and set ground-truth, which considers values beyond them as [7] Luo, Weixin, Wen Liu, and Shenghua Gao. "Remembering history with anomalies. We compared the number of TadGAN convolutional lstm for anomaly detection." 2017 IEEE International hyperparametric settings in nine cases with ground-truth, Conference on Multimedia and Expo (ICME). IEEE, 2017. respectively, to yield F1-Score. As a result, we show that [8] Goodfellow, Ian J., et al. "Generative adversarial networks." arXiv GAN-based models can apply anomalies detection in time preprint arXiv:1406.2661 (2014). sequensial data. Using the TadGAN used in this study in GAN- [9] Geiger, Alexander, et al. "TadGAN: Time Series Anomaly Detection based models that did not take into account the cycle series in Using Generative Adversarial Networks." arXiv preprint the past, we can see a good representation of the characteristics arXiv:2009.07769 (2020). of the cycle. Furthermore, detection close to anomalies

55 2021 International Conference on Digital Contents: AICo(AI, IoT and Contents) Technology (DigiCon-21) Fourier Random Features and LSTM for Healthcare Irregular Pattern Detection

Ermal Elbasani Jeong-Dong Kim* Department of Computer Science and Engineering, Sun Department of Computer Science and Engineering, and Moon University, Asan 31460, South Korea. Genome-based BioIT Convergence Institute, Sun Moon [email protected] University, Asan 31460, South Korea. [email protected]

Abstract— In this era of continuous innovations, the in practice, specifically on large-size datasets, is not as good healthcare monitoring systems are changing continuously and compare to neural networks [4]. On the other side, neural becoming closer to an individual for more detailed monitoring. networks are becoming the most popular method in machine Wearable technologies are becoming a subcategory of the learning with many successful applications, but even basic personal health system that provides continuous health theoretical properties about neural networks for many issues, monitoring through non-invasive biomedical, biochemical, and such as representation for multimodality, have not been physical measurements. An important topic in health monitoring understood yet. Random Fourier Features (RFF) methods is to detect irregular pattern the correspond serious matter in approximate kernel methods and provide a solution for the taking care are raising awareness for individual health to prevent scalability cases. Meanwhile, RFF can be easily integrated with chronic disease. In this paper, the Random Fourier Features neural networks that can be a flexible data representation tool. (RFF) model incorporates the Long-Short-Term Memory architecture into kernel learning, which significantly boosts the The remainder of this paper is organized as follows; Section II flexibility and richness of detection while keeping deep learning presents the related work. Section III describes the proposed kernels methods to have a significant performance in different methods and the. The results are presented and discussed in scale of data. The results indicate that the proposed model Sections IV to 6. Section V provides the conclusion and future performs more effectively can yield a satisfactory accuracy in work. irregular pattern detection. II. RELATED WORK Keywords—machine learning;healthcare; irregular pattern detection; kernel; random fourier features. This research is categorized under merging healthcare data analysis and LSTM for irregular pattern detection. Recently, a lot of research that integrate deep learning approaches for I. INTRODUCTION detecting irregular pattern detection or also called anomalies. A Healthcare and wellness are continuously becoming comprehensive set of information for general applications can attractive and important topics from scholars and researchers in be found in [5]. Many machine learning approaches use all fields of science and technology. Recent technologically electrocardiogram (ECG) automatic selection to detect innovative work has increased the quality of wearable sensor abnormal features for binary classes [6]; however, these devices as an important means to record health signals and classical techniques do not include overall health monitoring at enabling extensive data recorded even if most of these data is a high level of detection. A significant issue regarding feature not used as it may require. In most health care or personalized extraction in health data is the accurate detection of features system, the data are then aggregated into a system to be that represent anomaly patterns. Other methods consider their processed and analyzed by experts or by simple fitness performance based on the nature of sequential data modeling’s, applications. In order to detect an irregular pattern within such as linear dynamic systems and the hidden Markov model health care personalized data, various methods and algorithms [7]. In a supervised model, RNN are successful for sequential have been designed, including conventional statistical methods data fraud detection. Most studies regarding irregular pattern and machine learning with supervised and unsupervised detection in the healthcare domain focus on ECG based on methods [2]. However, these methods above cannot correctly feature extraction and time-series [1]. RNN that utilizes LSTM identify an irregular pattern or perform analysis based on uses data points, and prediction error likelihood for detecting extracted health features. Recently, machine learning methods anomalies [6], increasing the performance and also another have been used in pattern detection also in anomalies to deep learning method is Convolution Neural Networks (CNN), improve accuracy and achieve higher automation. Among which can rapidly address high-dimensional data [5]. The neural networks, those with high performance in time-series integration of random features with deep learning has grown data are Recurrent Neural Networks (RNN) and Long Short- rapidly. Hence, it is desirable to have a broad overview of this Term Memory (LSTM) [2,3]. Recently Kernel methods and topic, explaining the connections among various algorithms neural networks have become two important schemes in the and theoretical results[8]. Approximations based on random supervised learning field. The theory of kernel methods is Fourier features have recently emerged as an efficient and agile widely understood and applied, but kernel method performance methodology for designing large-scale kernel machines [9,10].

56 2021 International Conference on Digital Contents: AICo(AI, IoT and Contents) Technology (DigiCon-21)

III. FOURIER RANDOM FEATURES BASED LSTM proposed model a staked LSTM with 3 layers with 100 neurons Before The focus in this section is the classification of time is designed also. series data in the presence of sparse and irregular sampling. Consider a data set of n independent time, series D = {S1, . . . , IV. RESULT AND DISCUSSION Sn}, where each time series Si is represented as a list of time In this section, RFF-LSTM method result of training and points xi1=[ xi1 … xi|si|] and a list of corresponding values evaluation result is provided. Also, RFF-LSTM method is yi1=[ yi1 … yi|si|]. We assume that each time series is defined compared with stacked LSTM and their performances are over a common continuous-time interval [0, T]. However, for described. For the performance evaluation of machine learning irregularly pattern detection time series, we do assume that all for irregular pattern detection, the methods were based on of the time series are defined on the same collection of time parameters in terms of the number of false positives (FP) and points, and also do assume that the intervals between time false negatives (FN). A study of performance evaluation for points are uniform, and we do assume that the number of classification tasks tare used widely in learning techniques, observations in different time series is the same. For Random based on standard metrics, such as recall, and precision, and F- Fourier features, we consider the random features using the measure, are used to evaluate. following form of feature maps σ(ω · x ), where (ω, ) is a parameter chosen randomly according to a certain distribution, and σ(·) is a nonlinear function. The first set of random features consists of random Fourier bases cos(ωx ) where ω Rd . These mapping processes project data points on a randomly chosen line and pass the resulting scalar through a sinusoidal∈ function. For Learning set {x, x1} is required a positive definite shift invariant kernel k(x,x1)=k(x-x1) then compute the Fourier transform q of the kernel:

(1) where features points {ω1, . . . , ωN } with the feature map φ(ω; x) according to the probability distribution and the result is Figure 2. RFF-LSTM performance expected corresponding feature vector: Several tests are repeated by changing the dataset size and (2) evaluating only the F1-measure value for the training step. The results are given in Figure 2 when RFF-LSTM performs steadily even in different data scales compared to stacked- LSTM models that show low performance in a small scale of data. Compare to Stacked LSTM the baseline model, RFF- LSTM, performed better, and the evaluation is done based on F1-measurements (Table 1) shows a harmonic balance between detected classes.

Table 1 Performance evaluation in different dataset scale Dataset size RFF-LSTM Stacked-LSTM 10000 90 83 30000 91 88 50000 92 90 Figure 1. RFF with LSTM for pattern recognition 70000 92 91 Moreover, hence, Random Fourier Features LSTM provides a foundation for comparing kernel methods and neural networks (Figure 1 ). Designing the direction of these lines from a proper distribution ensures that the product of two transformed points V. CONCLUSION will approximate the desired shift-invariant kernel. Notably, the This study provided a novel model to detect irregular benefits of the proposed mapping before presentation to the patterns based on health data obtained from wearable devices. LSTM has been highlighted in Algorithm 2, which indicates The results were promising shows that the model proposed that the posterior is more “peaked" given RFFs as input to the RFF-LSTM structure showed a highly promising performance LSTM, as compared to the indigenous observations. There is 1 compare to stacked-LSTM. Als need to be emphasized that LSTM layer with 30 neurons. In order to compare our RFF as Kernel learning methods are becoming effective learning methods in combination with deep learning methods.

57 2021 International Conference on Digital Contents: AICo(AI, IoT and Contents) Technology (DigiCon-21)

With RFF, we could establish a simple framework and make [2] P. Malhotra, L. Vig, G. Shroff et al, “Long Short-Term Memory every kernel in RFF layers be trained end-to-end, and no need Networks for Irregular pattern Detection in Time Series,” European for representations. On small datasets for which deep learning Symposium on Artificial Neural Networks, 22–24, 2015. [3] S. Chauhan, and L. Vig, “Irregular pattern detection in ECG time signals is generally unsuitable due to overfitting as show in results via deep long short-term memory networks,” Proceedings of the 2015 when stacked LSTM performs poorly and proposed RFF- IEEE International Conference on Data Science and Advanced Analytics, LSTM achieves superior performance. Regarding future 1-7, 2015. studies, various datasets in scale and types with more [4] Liu, Fanghui, Xiaolin Huang, Yudong Chen, and Johan AK Suykens. experimental results should be developed. In addition, several "Random features for kernel approximation: A survey in algorithms, combinations of other methods should be implemented and theory, and beyond." arXiv preprint arXiv:2004.11154 (2020). then tested with several datasets for achieving more results for [5] A.O. Adewumi and A.A. Akinyelu. A survey of machine-learning and irregular pattern detection. nature-inspired based credit card fraud detection techniques, at International Journal of System Assurance Engineering and Management, 82:937–953, 2017. ACKNOWLEDGMENT [6] A. Szczepański and A. Saeed, “A mobile device system for early warning of ECG anomalies,” Sensors, 14,6, 11031–11044, 2014. This study is supported by the National Research [7] Längkvist M, Karlsson L, Loutfi A. A review of unsupervised feature Foundation of Korea (NRF) grant funded by the Korean learning and deep learning for time-series modeling. Pattern Recognition government (MSIT) (No. 2019R1F1A1058394). Letters. 2014 Jun 1;42:11-24. [8] Xie, Jiaxuan, Fanghui Liu, Kaijie Wang, and Xiaolin Huang. "Deep REFERENCES kernel learning via random Fourier features." arXiv preprint arXiv:1910.02660 (2019). [9] Rahimi, Ali, and Benjamin Recht. "Random Features for Large-Scale [1] S. Hochreiter and J. Schmidhuber, Long Short-Term Memory. Neural Kernel Machines." In NIPS, vol. 3, no. 4, p. 5. 2007. Computing, 9,8: 1735–1780, 1997. [10] Mitra, Rangeet, and Georges Kaddoum. "Random Fourier Feature Based Deep Learning for Wireless Communications." arXiv preprint arXiv:2101.05254 (2021).

58 2021 International Conference on Digital Contents: AICo(AI, IoT and Contents) Technology (DigiCon-21) An Augmented Reality based System for Interior Designing

Syeda Aqsa Fatima Rizvi Mahnoor Javaid Maryam Bukhari Department of Computer Science, Department of Computer Science, Department of Computer Science, COMSATS University Islamabad, COMSATS University Islamabad, COMSATS University Islamabad, Attock Campus, Pakistan Attock Campus, Pakistan Attock Campus, Pakistan

Abstract— In today’s world, technology is developing gradually so is the urge of people to surround themselves with II. RELATED WORK the latest tech gadgets. To accomplish this rate of demand, AR Making reasonable intuitive AR applications might be a (Augmented Reality) is a mixture of real and virtual reality and muddled errand hence the basic intricacy of AR content is known for its splendid work in interior designing and making models and accordingly the decent variety viewpoints that it easy for people to shop. Nowadays, it is facilitating a lot by must be contemplated inside the substance creation measure. creating an atmosphere where people can visualize the interior IKEA launched an AR app in 2017, allowing users to position design models in their physical spaces virtually in real-world at furniture from the IKEA catalog into their rooms using an their comfort zone by rotating, dragging, and zooming in and out of the interior designing models. All this can be done with iPhone camera. Several e-commerce retailers, including the help of Marker-less-based Augmented Reality and Google Wayfair and Chairish, have released similar products. Some ARCore. It is a platform for the design and deployment of professional architects have experimented with using AR in Augmented Reality. With the help of this framework, people can their design process. sense the real environment but not all applications serve in the same way. Many other applications support the feature of augmented reality that is of no good for people to cherish for III. METHODOLOGY example SayDuck. ARCore framework is the solution for it in interior designing applications for purchasing desire items after We must have the option to ascertain the 3D position in looking at it virtually in their real environment. reality. It is regularly determined by the spatial separation Keywords— ARCore; Augmented Reality; SayDuck; Marker- among itself and different central issues. This is called Less; Interior Design Simultaneous Localization and Mapping (SLAM). ARCore is completing two things: following the situation of the cell I. INTRODUCTION Interior designing arrangement in people's physical space is phone since it moves and building its comprehension of the also tedious work if there's an excessive amount of interior significant world. The pictorial representation of the items to be placed within the area. People can either draw up proposed work is shown in Figure 1. their physical space to design their room with the help of different computer graphics or by drawing it on paper to A. Mesh Creation: figure out how it is and fits within the area. Smartphones have The desired interior designing model is created by connecting become a vital gadget in our lifestyle. There are many mobile nodes with each other to create a model. A mesh consists of applications developed to provide ease to people. In the field triangles arranged in 3D space to create the impression of a of shopping, for example, selecting the desire interior design solid object. that matches in customer physical space or having thoughts B. CREATION OF MARK SEAMS AND UNWARP MESH: in mind that the interior design we are selecting will match A seam is the closures of the picture/material are sewn with already placed design in that physical space or not? It is together for example we mark seam on our 3D work as a difficult thing to do. Therefore, there's a need for an indicated by which the mesh is cut. application that helps customers in buying the correct item for their physical space. C. TEXTURE CREATION: All the AR based interior designing applications developer A surface is only a standard bitmap picture that is applied applies several techniques and algorithms in their over the mesh surface. The situating of the surface is finished applications. But the drawback they faced is that their with the 3D displaying programming that is utilized to make application didn't satisfy the demand of people like they lack the work. When the 3D state of the inside planning in the 2D processing speed, slow runtime 3D model view, and shortage picture has been recuperated in the perspective of the camera, in product inventory. These issues weaken the customer trust the work is re-extended onto the picture plane. and thus making them move towards other applications D.MESH RENDERING: considerably better than these applications. The Mesh Renderer takes the calculation from the mesh filter The rest of the paper is organized as section II describes some existing work, section III explains the proposed methodology, and renders it at the position characterized by the object's section IV presents the results while the last section IV change part. When the surface has been applied, the UV maps concludes all the work presented in the paper. are wrapped together to get the 3D model.

59 2021 International Conference on Digital Contents: AICo(AI, IoT and Contents) Technology (DigiCon-21)

FIGURE 1: Flowchart of proposed Work

E. FINAL 3D MODELING: IV. RESULTS To show models on the ground is done by the Controller The results produced by the proposed methodology include object which after detecting the ground, shows models on a GUI that permits clients to utilize select, spot, eliminate the ground. With the help of the “Lean Plugins” rotation, and change 3D models in an intelligent manner by utilizing translation, and scaling of the model will be done as given distinctive touch motions whenever over 90% of the screen is filled by camera stream and delivered models. A strong below: backend that handles the application's center usefulness, for example, identifying the motions, changing delivered 3D a) Translation: models, and stacking 3D models from local storage. The In interpretation mode, the object can be dislodged by accuracy results of indoor and outdoor detection are shown swiping. For this reason, a Lean translate module is in Table 2 while the experimental results of different AR- utilized. Moreover, the translation cycle is utilized to supported mobile phones are shown in Table I. The move the item towards pivot x, y, z. presented results show good performance. For each device, b) Rotation: both high and low resolution is taken into account for In the rotation mode, the object pivots by hauling as different numbers of peoples. Results with both platforms making a hover around the inside. For this reason, a Lean Marker-less and sensor-based are shown in Table I. pivot module is utilized. For the 3-dimensional rotation Furthermore, in Table II different factors are analyzed such measure, there are three sorts of a pivot, for example, a as Motion tracking, light estimation, and environmental turn X-axis, Y-axis revolution, and pivot on the Z-axis. understanding. The accuracy level for the indoor c) Scaling: environment is about 90% while in the outdoor In scaling mode, the object grows or lessens by the tasks environment it is 80%. The hardware support considered of zoom-in or zoom-out by utilizing two fingers. It is for these experiments is mobile. illogical to consider "scaling" a point, instead of an article.

60 2021 International Conference on Digital Contents: AICo(AI, IoT and Contents) Technology (DigiCon-21)

TABLE I. EXPERIMENTAL RESULTS OF DIFFERENT AR SUPPORTED MOBILE PHONES Supported Number of Marker-less Sensor-Based Device people Low High Low High Resolution Resolution Resolution Resolution

Huawei Nova 3i 1 7.63 0.75 60.34 60.76

3 8.06 0.80 60.45 60.56

Samsung Galaxy 2 7.60 0.72 60.55 60.66 A8 5 7.58 0.78 60.56 60.77

Oppo Reno 2 2 16.34 1.91 6.67 60.78

4 14.78 1.80 60.88 60.93

TABLE II. ACCURACY RESULTS OF INDOOR AND OUTDOOR DETECTION

Navigation Position Stability Hardware Motion Light Environmental Accuracy Accuracy Support Tracking Estimation understanding level Indoor Relatively Relatively Mobile 90% high high

Outdoor Relatively Relatively Mobile 80% high high

V. CONCLUSION In this research paper, we analyzed how a marker-less AR is utilized for online interior designing. AR has its special REFERENCES: favorable circumstances and is unfathomably acceptable at handling particularly perception issues. In an AR [1] En.wikipedia.org. 2021. UV mapping - Wikipedia. environment, purchasing furniture might be advantageous by [online]Availableat:https://en.wikipedia.org/wiki/UV_mapping#A totally bringing down the opportunity of item returns. The pplication_techniques benefit of this application is that it diminishes the worth and gives the interactive media enlargement of high clear [2] Blender, U., 2021. UV Unwrapping in Blender | Methods to recreations for clients progressively. We are endeavoring to Unwrap Mesh in Blender. [online] EDUCBA. Available at: coordinate photogrammetry to our current stage which can permit us to recreate a 3D model of furniture from pictures. The client can just picture the 3D models that are inside the [3] En.wikipedia.org. 2021. Seam carving - Wikipedia. [online] local storage. We will interface the application to a cloud Available at: archive from which a client could peruse the interior design model and import it during runtime. Moreover, we are [4] Medium. 2019. Understanding 3D matrix transforms with intending to reform how we share things by utilizing PixiJS. [online] Available at: photogrammetry standards.

61 2021 International Conference on Digital Contents: AICo(AI, IoT and Contents) Technology (DigiCon-21)

[5]Iopscience.iop.org. 2019. ShieldSquare Captcha. [online] Available at: [6] n.d. Automatic Mesh Generation. [ebook] Available at: [7] Motwani, M., Sharma, S. and Pawar, D., 2017. Furniture Arrangement Using Augmented Reality. [ebook] Available at:

62 2021 International Conference on Digital Contents: AICo(AI, IoT and Contents) Technology (DigiCon-21) Video Anomaly Detection by the Combination of C3D and LSTM

Yuxuan Zhao Department of Computing, School of Advanced Technology Ka Lok Man Xi’an Jiaotong-Liverpool University Department of Computing, School of Advanced Technology Suzhou, China Xi’an Jiaotong-Liverpool University,Suzhou, China [email protected] Swinburne University of Technology Sarawak, Malaysia Gabriela Mogos imec-DistriNet, KU Leuven, Belgium Department of Computing, School of Advanced Technology Kazimieras Simonavicius University, Xi’an Jiaotong-Liverpool University Lithuania Suzhou, China Vytautas Magnus University,Lithuania [email protected] [email protected]

Abstract—Video anomaly detection is a significant problem in computer vision tasks. It asks methods to detect unusual events in videos. The kernel of this task is to produce a correct understanding of the input video. To achieve this target, both spatial and temporal features are needed to be extracted by methods. Based on the research of image processing, the deep convolutional neural networks have been evaluated that they have good performance on the spatial feature extraction. Thus, the problem becomes how to get temporal features in the video. This paper proposes a model that combine two effective temporal Fig. 1 The general structure of the proposed method features processing methods, Convolution 3D and Long Short- term Memory to handle the video anomaly detection. We do II. METHODOLOGY experiments on a famous video anomaly dataset, UCF-crime, and The proposed method aims to get a better detection result achieve a better performance compared with other methods. than previous methods. To achieve this method, it combines the Keywords—video anomaly detection; computer vision; deep C3D and LSTM to improve the detection performance. learning; C3D; LSTM Considering the limitation of computing power and training time, several improvements have been made for both these two I. INTRODUCTION structures. Video anomaly detection is the problem of detecting The general working process of the model is shown in Fig. unforeseeable and emergency events in videos. This task is 1. The input video is sampled as RGB frames first. Then frames always tough and time-consuming. Anomaly detection methods will be handled by C3D to get both the spatial and temporal aim to automatically detect anomaly from input videos. To features. Then an LSTM layer is used to enhance the temporal achieve this target, methods always focus on the spatial and information and get the final detection result. temporal extractions. Unlike the image, besides the spatial features, videos also contain temporal features, which can improve the detection accuracy. However, it also requires video A. Video Processing anomaly detection method to have the ability to extract useful temporal features. In this paper, we provide a method that The first challenge is the computing power. Since there is combines two widely used methods, 3D Convolutional Neural one more dimension for the feature, the features of C3D are Network (C3D) [1] and Long Short-Term Memory (LSTM) [2] more complex than the features of traditional 2D CNN. In for temporal feature extraction to achieve a better performance. addition, longer sequences need multiple LSTM layers or In order to make these two methods work together, some more parameters to extract the temporal information. As the modifications to them are made. In addition, we do some result, more layers should be added or more parameters improvements for the video processing part since the problem of should be used in a single LSTM layer. However, it will lead computing power. to the problem of computing power. Therefore, C3D features limits the number of LSTM layers and parameters. To solve

63 2021 International Conference on Digital Contents: AICo(AI, IoT and Contents) Technology (DigiCon-21)

this problem, number of training samples should be reduced C. LSTM for both C3D and LSTM. For example, we can pick one part LSTM can be used for the unforeseeable events detection in of video, or pick one frame from every five frames. However, the previous paper. In the proposed model, the LSTM layer if we only pick one part of problem, it is hard to decide which is used to handle the clip-based features produced by the part of the video contains the unforeseeable events. If we C3D and output the video-based features for the final pick one frame from every five frames, it will affect the detection. Compared with frames, clip-based features can performance of the detection since the information of other reduce the calculation of LSTM. Therefore, the LSTM in the frames would miss. Finally, we divide each video into clips proposed method is a single-directional LSTM layer [3]. of 16 frames and extract C3D features for every clip. Then This simple structure can speed up the training process and each C3D can produce the clip-based feature so that the achieve better performance. LSTM can use these features as input for the further processing. For example, if a video contains 800 frames, we can get 800/16 = 50 clips. This method has the following advantages. III. EXPERIMENTS AND RESULTS • It reduces the pressure of computer power. Both C3D A. Dataset and LSTM are not arranged to handle frames of a whole We do experiments on the UCF-Crime dataset [4]. It video. consists of long untrimmed surveillance videos which cover 13 real-world anomalies, including Abuse, Arrest, Arson, Assault, • It solves the problem of other sample reduction method. Road Accident, Burglary, Explosion, Fighting, Robbery, This method still uses all frames as its input so it does Shooting, Stealing, Shoplifting, and Vandalism. The basic not waste any information. information about this dataset is shown below. • It allows the C3D and LSTM works together. • Number of videos: 1900 B. Clip Processing (C3D) • Number of labels: 13 Convolutional structure has been widely used in the spatial feature extraction. To extend this classic model to the • Total length: 128 hours temporal feature extraction, C3D is developed. The features • Average frame: 7274 provided by C3D can has one more dimension for the temporal information. Therefore, it can finally output both • Frame rate: 30fps spatial and temporal features. However, the extended Besides the challenge of anomalous activity detection, there are dimension also brings the problem of too many calculations some characteristics of this dataset that make the experiment in the training process. The clip input can alleviate this more difficult. problem to a certain extent. In addition, more modifications need to be done for the C3D structure itself. • Some videos replay themselves many times in one video file (Abuse001). Fig. 2 shows the C3D structure in the proposed method. Compared the original C3D network, two convolutional • Some videos contain different views of a single event groups have been reduced to simplify the whole model. In (Fighting 032). This will lead to dramatic changes in the addition, every 3D convolutional kernel is set to be 3*3*3, optical flow, which may make the temporal stream which is less than the size of traditional 2D CNN kernel. extract wrong features. • The lengths of different videos vary greatly. • B. Results and discussion TABLE I. shows the detection accuracy of different methods. The proposed method achieves the highest accuracy among these methods, which is 82.35%. Compared with the second- best method, it still has a 7% improvement. The experimental result shows that the proposed method has a good performance on video anomaly detection task. The C3D can work with LSTM to improve the detection result. In addition, the video processing method in this model, which use clip-based frame groups, is proven to effectively reduce the amount of calculation.

Fig. 2 The structure of the C3D

64 2021 International Conference on Digital Contents: AICo(AI, IoT and Contents) Technology (DigiCon-21)

TABLE I. PERFORMANCES OF DIFFERENT MODELS ON UCF-CRIME ACKNOWLEDGMENT DATASET Method Accuracy (%) This article is supported by Xi ’ an Jiaotong-Liverpool University (XJTLU), Suzhou, China with the Research Hasan et al. [5] 50.6 Development Fund (RDF-15-01-01). Ka Lok Man wishes to Lu et al. [6] 65.51 thank the AI University Research Centre (AI-URC), Xi ’ an Jiaotong-Liverpool University, Suzhou, China, for supporting Sultani et al. [7] 75.41 his related research contributions to this article through the Proposed method 82.36 XJTLU Key Programme Special Fund (KSF-E-65) and Suzhou- Leuven IoT \& AI Cluster Fund.

IV. CONCLUSION REFERENCES This paper proposes a new model for the video anomaly detection. The model combines the C3D and LSTM to achieve [1] Tran, D., Bourdev, L., Fergus, R., Torresani, L. and Paluri, M., 2015. Learning spatiotemporal features with 3d convolutional networks. better detection accuracy. The C3D is used to extract clip-based In Proceedings of the IEEE international conference on computer spatiotemporal features of the video. Then a single-directional vision (pp. 4489-4497). LSTM layer is used to enhance the temporal features and [2] Hochreiter, S. and Schmidhuber, J., 1997. Long short-term produce final video-based features and detection results of the memory. Neural computation, 9(8), pp.1735-1780. whole stream. The model is evaluated by the UCF-crime video [3] Zhao, Y., Man, K.L., Smith, J., Siddique, K. and Guan, S.U., 2020. dataset. The result shows that our method achieves the highest Improved two-stream model for human action recognition. EURASIP detection accuracy, which is 82.36%, compared with other Journal on Image and Video Processing, 2020(1), pp.1-9. methods. This result shows that C3D and LSTM can work [4] Sultani, W., Chen, C. and Shah, M., 2018. Real-world anomaly detection together in the video anomaly detection task. The future work in surveillance videos. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 6479-6488). will focus on the simplification of this model. Since both C3D [5] Hasan, M., Choi, J., Neumann, J., Roy-Chowdhury, A. K., & Davis, L. S. and LSTM requires a lot on the computing power and affect the (2016). Learning temporal regularity in video sequences. In Proceedings training time, modifications are necessary to solve this problem. of the IEEE conference on computer vision and pattern recognition (pp. 733-742). [6] Lu, C., Shi, J. and Jia, J., 2013. Abnormal event detection at 150 fps in matlab. In Proceedings of the IEEE international conference on computer vision (pp. 2720-2727). [7] Sultani, W., Chen, C. and Shah, M., 2018. Real-world anomaly detection in surveillance videos. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 6479-6488).

65 2021 International Conference on Digital Contents: AICo(AI, IoT and Contents) Technology (DigiCon-21) A Study on Radar Target Detection and Classification Methodologies

Yu Du Jeremy S. Smith Department of Electrical Engineering and Electronics, Department of Electrical Engineering and Electronics, School of Advanced Technology University of Liverpool, UK Xi’an Jiaotong-Liverpool University [email protected] Suzhou, China [email protected] Eng Gee Lim Ka Lok Man Department of Communications and Networking, Department of Computing, School of Advanced Technology School of Advanced Technology Xi’an Jiaotong-Liverpool University Xi’an Jiaotong-Liverpool University Suzhou, China Suzhou, China [email protected] [email protected]

Abstract—Target detection and classification is the most rule is the most widely used hypothesis [1, Ch.7], which important part of scenario analysis in autonomous driving and effectiveness has been proven in many use cases, but requires intelligent transportation systems (ITS). Moreover, robust all- deep knowledge of radar principle and rich experiences to weather sensing ability is one of the indispensable characteristics handle. With the popularity of Deep Neural Network (DNN) [2] for the sensing elements in such systems, while radar has shown a increases, DNNs have been proposed to radar application to great potential as the major sensor to handle this task which decrease the doorsill of radar implementation and try to solve thanks to the environmental insensitivity of electromagnetic waves. some corner cases of traditional approaches. However, unlike the informative images captured by cameras, reliable classification and detection of objects by radar sensor in Besides, object classification is always a key issue to handle real-time has proved to be quite challenging. To achieve the target, for radars. Normally radars use Doppler characteristics, Radar a comprehensive study on radar perception methodologies would Cross-Section (RCS) values for a rough object type estimation be a good step stone. In this paper, major classical and the state of [3], which limits the number of object types, and results that the art approaches are throughout summarized, and several could vary from different viewpoints. Image radars launched in potential deep neural network attempts have been discussed in recent years could potentially break through this bottleneck [4] comparison with traditional methodologies. of radar providing many more denser reflection points and the help of new feature extraction algorithms such as DNNs. ITS; Autonomous driving, Keywords—Radar perception; Different attempts have been taken in different stages of radar deep learning; CNN signal processing and have shown to be promising results. I. INTRODUCTION In the rest of this paper, Section II will introduce the Radar is considered as a kind of all-weather sensor traditional radar detection methodologies. Also, in section III independent of illumination conditions with a good dust some applied DNNs for radar object detection will be discussed. penetrability, while autonomous driving and intelligent Section IV will focus on the traditional radar classification transportation surveillance systems require reliable approaches and DNN based classification proposals will be the instantaneous monitoring of vehicle surroundings It means that content of Section V. Finally, some summarized conclusions radar sensing devices would be an appropriate solution as the will be drawn in Section VI together with related outlooks. core component of these two kinds of systems. Target and obstacle detection is one of the most classical applications of radar technologies along with a long industrialized history. Based on unique echoes from different targets, radar is able to determine whether there is a target presence or just some corresponding noises from the received data. Traditionally, three steps are taken in the radar signal processing chain, which sequentially are matched filtering, Doppler processing and hypothesis testing as illustrated in Fig. 1: Conventional radar signal processing chain Figure 1. The Constant False Alarm Rate (CFAR) method with decision making by the implementation of Newman-Pearson

66 2021 International Conference on Digital Contents: AICo(AI, IoT and Contents) Technology (DigiCon-21)

II. TRADITIONAL RADAR OBJECT DETECTION considered as the input file to DNN, i.e. the radar frequency (RF) Conventional radar operates with a sequence of frequency- data after Doppler processing and the radar location points. The modulated continuous-wave (FMCW) signals and transmits RF data are the output spectrums after FFTs, and the radar multiple chirps in each probe. As the typical 3 steps consisted location points are derived peak detection from RF data. processing chain shown in Figure 1, the doppler processing is There is a clear comparison between Figure 2.(a) and Figure the key procedure to extract information from filtered radar 2.(b), which shows that the range-Doppler spectrum is echoes. significantly different with or without a target present. Besides, The Doppler processing is normally conducted by together with the range-Doppler spectrum and other spectrums multidimensional Fast Fourier Transform (FFT) [5], different generated by multi-dimensional FFT do not only preserve all maps could be generated by referring to any two dimensions information available in the raw signal, but also yield a data according to certain purposes, for instance, the range-Doppler representation in forms of map or image, on which powerful map, distance-velocity map, and range-Doppler map, etc. While Deep Learning methods such as Convolutional Neural Networks for target detection, a range-azimuth map is normally considered (CNNs) [2] can be applied in a very similar approach as as the input of the CFAR detector. Below is a schematic diagram computer vision. Based on this idea, the CFAR detector can be of the Range-Doppler map [5], while the peak in Figure 2.(a) replaced with some CNN detectors by taking different represents a present target, and Figure 2.(b) does not have any dimensions of spectrum as illustrated in Figure 4. target in the field of view since it looks almost flat. The CFAR detector performs in each range-Doppler cell separately; the number of cells occupancy of a target could also be used to estimate the size of the target.

Fig. 4: CNN detector is replaced by CFAR detector

Continuously based on the range-Doppler spectrum, a study on CNN detector application has been conducted in [5]. In [5] , a sliding window detector that can slide over the spectrum by a fixed window size that has been proposed. The sliding window size is designed slightly larger than the known target echo peak in training data. The proposed CNN detector contains a total of Fig. 2 Range-Doppler spectrum after Doppler processing - (a) 8 layers, which consist of two convolutional layers, two with target present and (b) with target absent Rectified Linear Unit (ReLU) layers,[2] two max-pooling layers, Additionally, to enhance the plausibility of target detection, the and two fully connected layers. Among which the ReLU and detection could be repeated multiple times with possible max-pooling layer work as feature extractors, while the fully different ramps to comprehensively evaluate the target presence connected layers work as the classifier. The above-mentioned possibility, which is known as the coincidence detection process coincidence detection process is also applied in this study as [6]. One of the representative coincidence detection methods is multilayered input to the CNN detector as shown in Figure 5. the binary integration introduced no later than 1998 [1, Ch.6], Based on the collection of data, the training of these CNN which requires at least N times of possible judgment among M detector parameters is trained by back-propagation and times of repeated decision. A simple sketch as shown in Figure stochastic gradient descent [7]. In general, CNN detector 3 indicates the relationship of binary integration in comparison achieves slightly better performance as CFAR detector, but with the conventional radar signal processing chain. much more promising performance when dealing with large signal-to-noise ratio (SNR) targets.

Fig. 3: Coincidence detection process with binary integration Fig. 5: CNN detector replaces CFAR detector III. DNN APPROACHES IN RADAR TARGET DETECTION Target detection can be considered as a binary hypothesis In contrast to using only radar data as the input to DNN testing problem of target present or not in deep learning concept, detectors, a much convenient and high-performance cross- so to implement DNN algorithms on radar target detection modal concept was introduced in [8], which uses the problem, the input should be defined properly. Typically, there synchronized camera and radar data under the consideration that are two possible types of data representations that could be the camera is a leader, and the radar is a follower. The camera uses a reliable and mature probabilistic-driven concept for object

67 2021 International Conference on Digital Contents: AICo(AI, IoT and Contents) Technology (DigiCon-21) detection and location estimation by means of camera radar IV. TRADITIONAL RADAR CLASSIFICATION APPROACHES fusion strategy to generate annotation. The radar reflection In current automotive radar classification approaches, a range-azimuth map is taken as input and the object confidence statistical detection algorithm such as the CFAR algorithm map is predicted via a radar object detection net (RODnet) implementation, which is normally the first step to curtail the trained by the annotations. Based on the confidence map object information of multi-dimensional spectrum to a set of detected location or even class, it can be inferred. This well-trained location points due to limited computation resources of the RODnet has shown favorable object detection performance by onboard chipset [13]. Then the reflections are normally grouped radar only, where the corresponding pipeline is shown in Figure via clustering algorithms according to the location position 6. relationships or kinematic relationships before classifying [14]. The location point or a group of location points typically contains six-dimensional information, respectively are the radial distance, the azimuth and elevation angle, the radial velocity, the RCS, and the micro-Doppler representative. These values are known as location-level data. Common traffic participants have different characteristics kinematically and/or physically, for example, a summary of standard road objects RCS is shown in Table 1. In addition, Doppler changes induced by the bulk motion of a radar target, the target or any structure on the target undergoes micro-motion dynamics, such as mechanical vibrations of pedestrian arms and legs or rotations of the spokes of a bicycle, the micro-motion dynamics induces Doppler modulations on the returned signal, referred to as the micro- Doppler effect [15]. By carefully measure and process the micro-Doppler indication signals, more precise classification with more types of objects is achievable. Fig. 6: Cross-modal pipeline for radar object detection

Similarly, the training of radar detection net could also be supervised by lidar data or applying radar points as input for certain lidar point cloud-based method. Lidar-based DNN algorithms take lidar dense point cloud consisted object 3D outline as input for object detection and classification [9]. However, conventional radar point cloud is much sparser than lidar point cloud along with more noises, especially for relatively far end objects, normally no more than 5 points are Table 1: Average RCS for different road objects presented. This means to use radar point as radar data representative for target detection by DNN, an image radar is an Based on these statistical data and traditional machine essential element. Despite the relatively high cost of image radar, learning methodology embedded in a microcontroller, the point- the point-cloud representations lose a substantial amount of cloud-based methods work well for distinguishing object classes information characteristic for object aggregation [10], but for with distinctive shapes or standard characteristics of each almost every commercial ready radar, point-cloud of reflections common object from a certain viewpoint [14]. It makes the is the only output option available. traditional classification approaches able to handle predefined To overcome the lost information, [11] has proposed a NCAP (New Car Assessment Protocol) scenarios with a limited method by combining multiple location level data into a radar and standard obstacle, but for on-coming harder tasks, the cube and cropping a smaller block around it in three dimensions method is quite limited by the loss of information due to CFAR rather than 2D maps. The proposed network consists of three detection. parts. The first part encodes the data in spatial domains (range, azimuth) and grasps the surroundings’ Doppler distribution. The second part is applied to such an output to extract class V. DEEP-LEARNING BASED OBJECT CLASSIFICATION information from the distribution of location speed. Finally, the Simple localization of potential obstacles and classification third part provides classification scores by two fully connected of standard objects in predefined vehicle’s driving path is not layers. The output is the confidence level score of each able to fulfill the requirement from rapidly growing autonomous corresponding class. In the latter case, an ensemble voting step driving and intelligent transportation industries anymore. that combines the result of several binary classifiers that could Similar to the aforementioned pipeline of deep-learning-based enhance the score of the proposed network, which is similar to object detection, the multi-dimensional FFT generated radar [12]. spectrums and radar location points are considered as two different types of input data for deep learning networks.

68 2021 International Conference on Digital Contents: AICo(AI, IoT and Contents) Technology (DigiCon-21)

The research of spectrum-based data-driven approaches is (MSG) module in [22] is used for lidar point cloud selection, in being boosted by the computer vision community and many which grouping and feature generation are applied on radar researchers are focusing on the transfer of image processing reflection to make MSG processed radar data appear evenly as methods to radar, which may also explain why most deep lidar data processed by MSG. Such a process is analogous to learning approaches are using spectrum as input [16]. Spectrum- convolutional networks for image processing, where the image based approaches use a grid representation of the radar data. [14] dimensions are reduced in each layer as shown in Figure 7. has introduced an innovative method to classify various static objects, which is effective. Since the scenario is static, it makes the Doppler spectrum not applicable. On the other hand, [14] uses a range-velocity spectrum computed via 2D-FFT. In addition to this range-velocity spectrum, the azimuth of each range-velocity bin is calculated, and the magnitude is recorded, which results a 3D range-velocity-azimuth spectrum. [14] has chosen to use CFAR for initial object detection due to the high Fig. 7: Examples of radar point cloud and multi-scale grouping maturity level of the algorithm, and then applied a region of The training is manually grouped along with annotated interest (ROI) around the detected target occupant bin Thus, the labels. ROI is slightly larger than the detected object, while the peak is in the center of the ROI.

The ROIs are considered as the input to a CNN, the proposed VI. CONCLUSION AND OUTLOOK network of [14] consists of 3 convolutional layers with a 3x3 kernel, 3 average pooling layers, and 2 fully connected layers. Deep learning application in radar sensing territory is still in Most creative part of the proposed network is the radar specific the early stage, but some approaches have already been proven approaches to incorporate prior information about the location very promising. The deep learning algorithms are able to learn of the ROI in the entire radar field of view, and suppresses the favorable representations of raw input data in deeper layers of interfering reflections in the ROI. Interfering is causing by the neural networks, which capture the crucial features necessary for with range expanded ROI physical area coverage, due to polar object classification, and exhibit invariances to viewpoints, coordinate systems are used by radar sensors. [14] has created a surrounding conditions, noise, and other transformations. For distance to center map by focusing on the center of each bin of the input to DNNs, the point cloud is more practical to be the ROI, which refers to the physical distance. Thereby the CNN generalized across different radar systems, which makes it more can learn to leverage both azimuth and range information by appropriate for industrialization purposes. While powerful applying filters across two input channels to extract object- spectrum is a more informative input type for the deep learning specific features. network, theoretically could have better performance especially under complex scenarios in both target detection and The above CNN shows a promising result. However, the classification aspect, but computationally is also complex and sensitivity of radar reflections from different angles to the makes it hard to implement in an embedded system. objects radar spectrum and the classification could abruptly change from one frame to another. [14] and [17] have proposed In general, from a development point of views, with the big a computational-friendly approach by applying a temporal filter data boosting and connection of everything concept evolved, the across frames to predict the object class based on the majority of data-driven detector and classifier based on a more informative the vote from each frame. It is actually similar to the concept of source data would be a more prospective pipeline. While more recurrent neural network architecture. cross-domain researches among radar, camera and lidar sensor about their common sensing methodology and fusion concept Using spectrum as input to deep learning networks shows should be very meaningful for future use cases. good results, but the spectrum data are quite sensitive to modulation and sensor properties, thus it makes each radar system need its own training data set, and overall it becomes ACKNOWLEDGMENT hard to generalize the performance between each version and each mounting position of radar systems. Therefore, the Ka Lok Man wishes to thank the AI University Research reflections of radar point as inputs to DNN are still very worth Centre (AI-URC), Xi’an Jiaotong-Liverpool University, Suzhou, studying. China, for supporting his related research contributions to this paper through the XJTLU Key Programme Special Fund (KSF- Radar point cloud-based approaches usually accumulate data E-65) and Suzhou-Leuven IoT & AI Cluster Fund. over several measurements, some approaches use occupancy grid map for object type classification in stationary environments [18]. In high dynamic scenarios, classification is conducted in every single reflection, and data associated with the objects nearby are omitted.

[19] has employed PointNet [20] and PointNet++ [21] REFERENCES methods as the basis, and adapted the architecture to radar [1] L. Nicolaescu and T. Oroian, "Radar cross section," 5th International characteristics. Since the radar reflection density is sparser than Conference on Telecommunications in Modern Satellite, Cable and lidar point cloud to achieve the basis,. a multi-scale grouping Broadcasting Service. TELSIKS 2001. Proceedings of Papers (Cat.

69 2021 International Conference on Digital Contents: AICo(AI, IoT and Contents) Technology (DigiCon-21)

No.01EX517), 2001, pp. 65-68 vol.1, doi: ensembles,” in 2019 IEEE Intelligent Vehicles Symposium (IV 2019), pp. 10.1109/TELSKS.2001.954850. 722–729. [2] I. Goodfellow, Y. Bengio, and A. Courville, Deep Larning, 1st ed. [13] Ulrich, Michael & Gläser, Claudius & Timm, Fabian. (2020). Cambridge, MA, USA: MIT Press, 2016. [Online]. Available: DeepReflecs: Deep Learning for Automotive Object Classification with http://www.deeplearningbook.org. Radar Reflections. [3] M.A. Richards, Fundamentals of Radar Signal Processing, 1st ed, New [14] K. Patel, K. Rambach, T. Visentin, D. Rusev, M. Pfeiffer and B. Yang, York, NY, USA: McGraw-Hill,2005. "Deep Learning-based Object Classification on Automotive Radar [4] M. Gottinger et al., "Coherent Automotive Radar Networks: The Next Spectra," 2019 IEEE Radar Conference (RadarConf), 2019, pp. 1-6, doi: Generation of Radar-Based Imaging and Mapping," in IEEE Journal of 10.1109/RADAR.2019.8835775. Microwaves, vol. 1, no. 1, pp. 149-163, winter 2021, doi: [15] V. C. Chen, F. Li, S. -. Ho and H. Wechsler, "Micro-Doppler effect in 10.1109/JMW.2020.3034475. radar: phenomenon, model, and simulation study," in IEEE Transactions [5] V. Winkler, "Range Doppler detection for automotive FMCW radars," on Aerospace and Electronic Systems, vol. 42, no. 1, pp. 2-21, Jan. 2006, 2007 European Radar Conference, 2007, pp. 166-169, doi: doi: 10.1109/TAES.2006.1603402. 10.1109/EURAD.2007.4404963. [16] Y. Kim and T. Moon, "Human Detection and Activity Classification [6] L. Wang, J. Tang and Q. Liao, "A Study on Radar Target Detection Based Based on Micro-Doppler Signatures Using Deep Convolutional Neural on Deep Neural Networks," in IEEE Sensors Letters, vol. 3, no. 3, pp. 1- Networks," in IEEE Geoscience and Remote Sensing Letters, vol. 13, no. 4, March 2019, Art no. 7000504, doi: 10.1109/LSENS.2019.2896072. 1, pp. 8-12, Jan. 2016, doi: 10.1109/LGRS.2015.2491329. [7] Shun-ichi Amari, Backpropagation and stochastic gradient descent [17] S. Hochreiter and J, Schmidhuber, “long short-term memory”, Neural method, Neurocomputing, Volume 5, Issues 4–5,1993,Pages 185- computation, vol. 9, pp. 1735-1780, 1997’ 196,ISSN 0925-2312, https://doi.org/10.1016/0925-2312(93)90006-O. [18] J. Lombacher, M. Hahn, J. Dickmann and C. Wöhler, "Potential of radar [8] Y. Wang, Z. Jiang, Y. Li, J. -N. Hwang, G. Xing and H. Liu, "RODNet: for static object classification using deep learning methods," 2016 IEEE A Real-Time Radar Object Detection Network Cross-Supervised by MTT-S International Conference on Microwaves for Intelligent Mobility Camera-Radar Fused Object 3D Localization," in IEEE Journal of (ICMIM), 2016, pp. 1-4, doi: 10.1109/ICMIM.2016.7533931. Selected Topics in Signal Processing, vol. 15, no. 4, pp. 954-967, June [19] O. Schumann, M. Hahn, J. Dickmann and C. Wöhler, "Semantic 2021, doi: 10.1109/JSTSP.2021.3058895. Segmentation on Radar Point Clouds," 2018 21st International [9] R. Sahba, A. Sahba, M. Jamshidi and P. Rad, "3D Object Detection Based Conference on Information Fusion (FUSION), 2018, pp. 2179-2186, doi: on LiDAR Data," 2019 IEEE 10th Annual Ubiquitous Computing, 10.23919/ICIF.2018.8455344. Electronics & Mobile Communication Conference (UEMCON), 2019, pp. [20] C. R. Qi, L. Yi, H. Su, and L. J. Guibas, “PointNet++: Deep Hierarchical 0511-0514, doi: 10.1109/UEMCON47517.2019.8993088. Feature Learning on Point Sets in a Metric Space,” arXiv preprint, jun [10] K. Patel, K. Rambach, T. Visentin, D. Rusev, M. Pfeiffer and B. Yang, 2017. [Online]. Available: http://arxiv.org/abs/1706.02413 "Deep Learning-based Object Classification on Automotive Radar [21] R. Q. Charles, H. Su, M. Kaichun and L. J. Guibas, "PointNet: Deep Spectra," 2019 IEEE Radar Conference (RadarConf), 2019, pp. 1-6, doi: Learning on Point Sets for 3D Classification and Segmentation," 2017 10.1109/RADAR.2019.8835775. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), [11] A. Palffy, J. Dong, J. F. P. Kooij and D. M. Gavrila, "CNN Based Road 2017, pp. 77-85, doi: 10.1109/CVPR.2017.16. User Detection Using the 3D Radar Cube," in IEEE Robotics and [22] E. Schubert, F. Meinl, M. Kunert and W. Menzel, "Clustering of high- Automation Letters, vol. 5, no. 2, pp. 1263-1270, April 2020, doi: resolution automotive radar detections and subsequent feature extraction 10.1109/LRA.2020.2967272. for classification of road users," 2015 16th International Radar [12] N. Scheiner, N. Appenrodt, J. Dickmann, and B. Sick, “Radar-based Road Symposium (IRS), 2015, pp. 174-179, doi: 10.1109/IRS.2015.7226315. user classification and novelty detection with recurrent neural network

70 2021 International Conference on Digital Contents: AICo(AI, IoT and Contents) Technology (DigiCon-21) Modeling of Mechatronic Systems with Multi-Axis

Liye Jia Yujia Zhai* Department of Mechatronics and Robotics, School of Department of Mechatronics and Robotics, School of Advanced Technology, Xi’an Jiaotong-Liverpool University, Advanced Technology, Xi’an Jiaotong-Liverpool University, 215123 P.R. China 215123 P.R. China [email protected] [email protected]

Kejun Qian Sanghyuk Lee Suzhou Power Grid Company, China Department of Mechatronics and Robotics, School of [email protected] Advanced Technology, Xi’an Jiaotong-Liverpool University, 215123 P.R. China [email protected]

Ka-Lok Man School of Advanced Technology, Xi’an Jiaotong-Liverpool University, 215123 P.R. China [email protected]

commonplace kind of machine that relied on multi-axis Abstract—Recently, the market has rigorous requirements on techniques, and the word malposition will appear in the accuracy and efficiency in the manufacturing industry, so the printing sheet if there is a disturbance that makes the multi-axis system has been used in many advanced industrial devices to fit current demand. Using an independent Multi-Axis System of the Printer vibrates. servo-motor to actuate a single moveable axis is the basic For the elimination of the synchronous error, many principle of the multi-axis mechanism. However, the engineers have specified the corresponding compensation asynchronous problems will severely reduce the performance of methods. For example, Feng and Koren, and Borenstein this mechanic system. Therefore, to compensate for the developed the Cross-coupling Motion Controller based on the synchronous error caused by external disturbances, the velocity feedback loop in 1993[3]. They provided a double multi-axis should be simultaneously controlled by the synchronization control principles, and B&R Company provides feedback system to remove the influence of the parameter the Evaluation and Training for Automation(ETA) workstation variation from the servo-motor movement. Subsequently, in for testifying the control methods. Based on the practical 2003, Sun discussed an adaptive coupling system based on the structure of the ETA, this paper's exception is to create a digital idea of Feng’s team[4]. Using Sun’s algorithm, dynamic model using MATLAB to replace the workstation with synchronization errors and position errors can converge to the Digital Twin technique, and the output of the virtual model zero when the multi-axis system is operating. Moreover, can apply as the control signal to restrain the motion of the veritable mechanisms. Zhong and his team create an adaptive-fuzzy friction compensator with the LuGre friction model to solve the Keywords—Multi-Axis, ETA, Simscape Multibody, Creo, problem derived by the nonlinear friction[5]. Hence, after the Lagrange’s Equation, Trajectory Plan, Digital Twin Digital Twin model of the ETA is accomplished, the verification of control strategy that combines all of the principles studied by predecessors in the virtual environment I. INTRODUCTION is to economize the experimental materials. Multi-axis is the foundational component in the modern In this paper, the objective is to introduce the detailed manufacturing industry because of its essential properties, process of the digital model implementation, and the such as quick response speed and tension smoothing[1]. simulation of the model is demonstrated. On the other hand, However, the nonlinear friction, the parameter variation, and the development method of the completed model is presented the external disturbance of servo-motors will provide the as the conclusion of this study. synchronous error to decline the accuracy of the performance of the multi-axis system[2]. For example, the printer is a

.

71 2021 International Conference on Digital Contents: AICo(AI, IoT and Contents) Technology (DigiCon-21)

II. METHODOLOGY

A. Introduction to B&R’s ETA Workstation The ETA is the basement of the MATLAB dynamic model, and its appearance has displayed in Fig. 1. Using two enlarges the rotation of the rotor of the servo-motor, and the upper and the lower gears are respectively relative to the master axis and slave axis of the double-axis system. Furthermore, two plastic plates should be mounted on the breaches of the slave gear to declare the motion errors of the workstation. If the collision break is on the left plate, the Fig. 3. Illustrations of Master Trajectories reason for the synchronous error is the faster speed of the B&R's workstation, so the 3-D model, which has similar master. Conversely, the right notch mentions the error caused characteristics as the ETA, is built by the Creo to obtain the by the slave mechanism. result of the mass analysis and provide the virtual facility for On the other hand, the electronic system is a significant the geometric calculation, such as finding the contact point at reference for creating the virtual actuation of the digital model. where the colliding force act. Moreover, the Simscape The system is combined with two Permanent Magnet dynamic model can use the Creo virtual entities to increase Synchronization Motor(PMSM) and the Programmable Logic the reliability of the simulation. Fig. 2 displays the appearance Controller(PLC). PMSM satisfies the requirement of precise of the completed 3-D Creo Model in two different views, and movement[6], and the engineer can adjust the position or the Table 1 contains the values of the moment of inertia of two velocity values to control the servo-motor relied on the EnDat axes derived by the mass analysis function of the Creo. encoder of the B&R's 8LS PMSM. The encoder will store the On the other hand, as the kinematic model, the virtual motion profiles of the rotor for analyzing the performance of structures of two axes are combined under the Pin property, the motor or creating the feedback compensators[7]. which means the gears can be manually rotated clockwise or Therefore, the synchronization coupling can be implemented. anticlockwise. Thus, rotating the slave axis to the Because of the EnDat encoder, the master motion profile can collision-instantaneous orientation of the double-axis system control the slave motor, and the PLC of the slave actuation is the method for obtaining the contact point with system will generate the commands based on the control mathematical computation in the Creo. With the measurement signal to make the slave movement coordinate with the master of the mechanical structure dimensions, the digital axis[8]. coordinates of the contact points can be calculated In brief, the Simscape model will be created based on the corresponding to the rotational center of each axis. principles that have been presented in this section. Therefore, the derivation of the Creo kinematic model is to import the virtual mechanisms and relevant contact points to B. Creo Model and Derivation the Simscape dynamic model, and the accomplishments in Because it is inconvenient to do some analyses on the this section are the basement for simulating the actual collision in MATLAB environment. TABLE I CREO MASS ANALYSIS RESULTS C. Lagrange’s Equation Symbol Quantity Value (Unit) For simulating the characteristics of the 8LS servo-motor in

5 2 JM moment of inertia of 1.064 x 10 gmm the dynamic model, the excepted displacement curves should master mechanism be the actuating signal to control the rotation of the Simscape 5 2 JS moment of inertia of 1.324 x 10 gmm model. However, the required position-control structure of slave mechanism the function block in MATLAB should state the velocity and

TABLE III DIGITAL COORDINATES OF POINTS ON SLAVE AXIS Digital Coordinates Symbol General Location (Units: mm)

A’01 on upper plate left (-8.89, 56.14) A’02 on upper plate right (8.89, 56.14) A’11 On lower plate left (-8.89, -56.14) A’12 on lower plate right (8.89, -56.14) Fig. 2. The Illustration of Creo Kinematic Model. Fig. 1. Appearance of B&R’s ETA Workstation.

72 2021 International Conference on Digital Contents: AICo(AI, IoT and Contents) Technology (DigiCon-21) acceleration corresponding to the desired displacement verification of the correctness of the actuation system. trajectory, so its complexity dissatisfies to create the Moreover, with at least four constraints, including the initial feedback-control system in the digital environment. Hence, and target values of the position, the velocity, and the the work mode should be the Torque-Mode, and the designed acceleration[11], the coefficients of position-mathematical converter to calculate the convenient torque with the excepted expression can be calculated to support the torque position expressions derived by Trajectory Plan relies on computation. With the conditions displayed in Table V, the motion equations of the double-axis mechanism. expressions of excepted displacement curves are that: The foundation of Lagrange's Equation is the energy, so 5 43 θM (t )= 0.216 t − 5.4 tt+ 36 (2.1) this principle can derive the motion equation with simple θ (t )= − 0.216 t5+− 5.4 tt 43 36 (2.2) mathematical calculation[9]. Therefore, to further decrease S the computation complexity, the Ogar Lagrange’s With Equations (2.1) and (2.2), the motion trajectories of presentation is applied in the study[10].

dT∂ ∂∂∂ T DV +++=Q (1.1) k dtqqqq∂k ∂∂∂ kkk The symbol of Equation (1) is explained by Table IV, and the motion equations of two axes can be computed with the moments of inertia in Table I and the assumed stiffness and damping of the workstation, where stiffness is presented by “k=0.5 Nm/deg”, and the damping is that “b=0.003 Nm/(deg/s)”. Jθ++= bk θ  θτ (1.2) MM M M M Fig.4. Illustrations of Slave Trajectories   JSSθ++= bk θ S θτ S S (1.3) two axes can be illustrated by the MATLAB program, and the Afterward, for the master axis, the actuating converter is plots are separately exhibited by Fig. 3 and Fig.4. determined by Equation (1.2), and Equation (1.3) provides In conclusion, all of the required fundamental calculations the basement of the slave converter. have been finished, and the completed Simscape model will apply their results to implement the simulation of the practical operation of the ETA workstation. Hence, the view of the D. Trajectory Plan dynamic model and the simulating results will be introduced Trajectory Plan is a method for verifying the function of in the subsequent section. Lagrange’s Converter and ensuring the designed motion refrains from the vertical oscillation caused by the III. SIMULATION DEMONSTRATION discontinuous signals. Comparing the output diagrams of the simulation with the desired trajectory curves is a considerable A. View of Dynamic Model and Rotation Simulation TABLE IV With the virtual entities derived by the Creo modeling, the SYMBOLS FOR OGRA LAGRANGE’S EQUATION[10] dynamic model has the same appearance as the B&R’s ETA Symbol Quantity Expression workstation, and its general views are displayed in Fig. 5. Furthermore, combining the computations based on 11 T kinematic energy Mx22−−− or −−− Jθ 22Lagrange’s Equation and Trajectory Plan, the digital 1122actuation system, which is equivalent to the electronic system, V potential energy k() x12− x −− or −− k ()θθ12 − 22 will actuate the model to execute the desired trajectories. 1122 D dissipation factor b() x12− x −− or −− b ()θθ12 − 22 Moreover, the movements of mechanisms will be recorded by Q external force F the sensor function blocks in the MATLAB toolbox, and the q degree of freedom x or θ diagrams are Fig. 6 and 7. q derivative of q v or ω

TABLE V CONSTRAINTS FOR CALCULATION OF TRAJECTORY PLAN

Symbol Quantity Value (Units)

initial position 0 deg θ0 target position 3600 deg θt v0 initial velocity 0 deg/s

vt target velocity 0 deg/s initial acceleration 0 deg/s2 α0 target acceleration 0 deg/s2 αt

t0 starting time 0 s t ending time 10 s

Fig. 5. Appearance of Simscape Dynamic Model

73 2021 International Conference on Digital Contents: AICo(AI, IoT and Contents) Technology (DigiCon-21)

In conclusion, this project has partial achievement. The current dynamic model can detect the correctness of the desired moving trajectories for the practical double-axis system, such as the B&R’s ETA workstation. Nevertheless, comparing with the Digital Twin technique, the model is too theoretical to make the simulation away from the actual mechatronic instruments. Firstly, the Simscape model does not apply the error compensators to simulate the situation that the system operation is affected by the external disturbances. Although Fig. 6. Simulating Trajectories of Master Axis the error will decrease the performance of the mechanical structure, the equipment must be influenced by external disturbances in the practical operation. Moreover, the feedback control system is not built into the model because no error waits to compensate. Thus, the improvement is to create the compensator for simulating the influence of the errors, and a PLC should be imported to the current structure to compensate for the differences between the excepted motion and the simulation.

ACKNOWLEDGMENT

Fig. 7. Simulating Trajectories of Slave Axis This research was financially supported by the AI Centre at In Fig. 6 and 7, the noises occur in the acceleration diagram. Xi’an Jiaotong-Liverpool University. The authors would like This phenomenon is caused by the step difference. Because of to thank all the parties concerned. the variable step length of the simulation, the transform of the curves is not regular. However, from other figures, actually, REFERENCES the virtual mechanism perfectly executes the excepted [1] D. Chen and H. Zhang, “A multi-axis synchronous register control system for color rotogravureprinter,” in 2006 International motion. Technology and Innovation Conference (ITIC 2006), 2006, pp.1913–1918. [2] S.-K. Jeong and S.-S. You, “Precise position synchronous control of B. Collision Simulation and Detection multi-axis servo system,” Mechatronics, vol. 18, no. 3, pp. 129–140, Consequently, the model can detect the collision in the 2008. [3] L. Feng, Y. Koren, and J. Borenstein, “Cross-coupling motion digital environment with a subsystem implemented by the controller for mobile robots,” IEEE Control Systems Magazine, vol. 13, designed algorithm. However, because of the ideal rotation, no. 6, pp. 35–43, 1993. the axis will not interact with another. Hence, to testify the [4] D. Sun, “Position synchronization of multiple motion axes with adaptive coupling control,” Automatica, vol. 39, no. 6, pp. 997–1005, detecting function, the movement of the slave axis is manually 2003. eliminated, and the collision situation between the plastic [5] G. Zhong, Z. Shao, H. Deng, and J. Ren, “Precise position synchronous plate and the breaches of the master gear is indicated in Fig. control for multi-axisservo systems,” IEEE Transactions on Industrial 8. Electronics, vol. 64, no. 5, pp. 3707–3717,2017. [6] A. Hughes and B. Drury, Electric motors and drives: fundamentals Fig. 8 states the collision between the point “A’01” and all types and applications. Newnes, 2019. of the contact points of the master breaches left. For the pulses [7] (2005) 8ls three-phase synchronous motors user’s manual. in the diagram, one of the pulses determines one-time [Online]. Available: https://download.br-automation.com/BRP4440000000000000050508 interaction happens. Therefore, the detecting system is 2/8LSAC...-0%20Users%20manual%20Motor-V0%20Manual-V1.pd working, and this ideal model can be applied to verify the f?px-hash=7f3eae6a2a85ef14a126a31654b5998a&px-time=1622182 reliability of the planned trajectories for the practical 73. [8] (2017) motion control: electronic gears and cam profiles [Online]. double-axis mechatronic systems. Available: https://download.br-automation.com/BRP4440000000000000051421 IV. CONCLUSION 5/TM441TRE.433-ENGMotion%20Control%20electronic%20gears %20and%20cam%20profilesMpAxisV4330.pdf?px-hash=04063819 7a792328a9298c3fefda06ca&px-time=1622183255. [9] H. Benaroya, Mechanical vibration: analysis, uncertainties and control. CRC Press, 2004. [10] G. Ogar and J. D’Azzo, “A unified procedure for deriving the differential equations of electrical and mechanical systems,” IRE Transactions on Education, no. 1, pp. 18–26, 1962. [11] P. J. McKerrow and P. McKerrow, Introduction to robotics. Addison-Wesley Sydney, 1991, vol. 3.

Fig. 8. Collision Report Diagram

74