<<

A publication of ISCA*: International Society for Computers and Their Applications

INTERNATIONAL JOURNAL OF COMPUTERS AND THEIR APPLICATIONS

TABLE OF CONTENTS

Page

Guest Editorial: Selected Papers from CATA 2018 ...... 103 Gordon Lee and Les Miller

Labeling Methods Using EEG to Predict Drowsy Driving with Facial Expression Recognition Technology ...... 104 Daichi Naito, Ryo Hatano, and Hiroyuki Nishiyama

LMS Performance Issues: A Case Study of D2L ...... 113 Sourish Roy, Carey Williamson, and Rachel McLean

An Evaluation of the Performance of Join Core and Join Indices Query Processing Methods ...... 123 Reham M. Almutairi, Mohammed Hamdi, Feng Yu, and Wen-Chi Hou

Novel Low Latency Load Shared Multicore Multicasting schemes – An Extension to Core Migation ...... 132 Bidyut Gupta, Ashraf Alyanbaawi, Nick Rahimi, Koushik Sinha, and Ziping Liu

* “International Journal of Computers and Their Applications is abstracted and indexed in INSPEC and Scopus.”

Volume 25, No. 3, Sept. 2018 ISSN 1076-5204

International Journal of Computers and Their Applications A publication of the International Society for Computers and Their Applications

EDITOR-IN-CHIEF

Dr. Frederick C. Harris, Jr., Professor Department of Computer Science and Engineering University of Nevada, Reno, NV 89557, USA Phone: 775-784-6571, Fax: 775-784-1877 Email: [email protected], Web: http://www.cse.unr.edu/~fredh

ASSOCIATE EDITORS

Dr. Hisham Al-Mubaid Dr. Wen-Chi Hou Dr. James E. Smith University of Houston-Clear Lake, Southern Illinois University, USA West Virginia University, USA USA [email protected] [email protected] [email protected]

Dr. Shamik Sural Dr. Ramesh K. Karne Dr. Antoine Bossard Indian Institute of Technology Towson University, USA Advanced Institute of Industrial Kharagpur, India [email protected] [email protected] Technology, Tokyo, Japan [email protected]

Dr. Ramalingam Sridhar Dr. Bruce M. McMillin The State University of New York at Dr. Mark Burgin Missouri University of Science and Buffalo, USA University of California, Technology, USA [email protected] Los Angeles, USA [email protected] [email protected] Dr. Junping Sun

Nova Southeastern University, USA Dr. Muhanna Muhanna Dr. Sergiu Dascalu [email protected] Princess Sumaya University for University of Nevada, USA Technology, Amman, Jordan [email protected] [email protected]

Dr. Jianwu Wang University of California Dr. Sami Fadali San Diego, USA University of Nevada, USA Dr. Mehdi O. Owrang [email protected] [email protected] The American University, USA [email protected] Dr. Yiu-Kwong Wong Dr. Vic Grout Hong Kong Polytechnic University,

Glyndŵr University, Hong Kong Dr. Xing Qiu Wrexham, UK [email protected] University of Rochester, USA [email protected] [email protected]

Dr. Yi Maggie Guo Dr. Rong Zhao University of Michigan, Dr. Abdelmounaam Rezgui The State University of New York Dearborn, USA New Mexico Tech, USA at Stony Brook, USA [email protected] [email protected] [email protected]

ISCA Headquarters…•…P. O. Box 1124, Winona, MN 55987 USA…•…Phone: (507) 458-4517 E-mail: [email protected] • URL: http://[email protected].

Copyright © 2018 by the International Society for Computers and Their Applications (ISCA) All rights reserved. Reproduction in any form without the written consent of ISCA is prohibited. IJCA, Vol. 25, No. 2, Sept. 2018 103

Guest Editorial: Selected Papers from CATA 2018

CATA (Computers and their Applications) is one of the flagship conferences for the International Society of Computers and their Applications. The intent of this conference has been to blend theory and practice as a means of stimulating researchers from both research dimensions. The papers for this special issue are extensions of their conference version and illustrate the spectrum of the 38 papers presented at the CATA 2018 conference.

Paper 1: Labeling Method Using EEG to Predict Drowsy Driving with Facial Expression Recognition Technology. Authors: Daichi Naito, Ryo Hatano and Hiroyuki Nishiyama. In this paper, the authors develop a non-contact method for predicting a driver’s drowsiness based on a set of classifiers that make use of a combination of EEG readings and facial expression capture for the same time period. They use a 3D camera and employ several machine learning methods as well as automated labeling for prediction. The Gaussian kernel SVM method gave performance and results show that their approach provides a viable and efficient approach for predicting a driver’s drowsiness.

Paper 2: LMS Performance Issues: A Case Study of D2L. Authors: Sourish Roy, Carey Williamson, and Rachel McLean. The paper focuses on a particular network, the Learning Management System at the University of Calgary, and the authors perform a study of the traffic flow, identify the causes for traffic congestion, offer several solutions to the congestion problem, and present a web cache simulation study to show that this simple solution alleviates traffic problems.

Paper 3: Performance Evaluations of Two Fast Join Query Processing Methods: Join Core and Join Indices. Authors: Reham M. Almutairi, Mohammed Hamdi, Feng Yu, and Wen-Chi Hou. An evaluation of the performance of the relational database model is provided in this paper. In particular, the authors compared the performance of the Jive-join algorithm, the Join Core algorithm and the MySQL implementation. It is shown that the join core performs faster that the join indices, although both methods perform well for TCP-H benchmark datasets.

Paper 4: Novel Low Latency Load Shared Multicore Multicasting Schemes – An Extension to Core Migration. Authors: Bidyut Gupta, Ashraf Alyanbaawi, Koushik Sinha, and Nick Rahimi. The authors present four schemes for load shared multicasting using static networks. Two of the methods consider load sharing as the performance metric, while the other two also focus on low latency multicasting. Further, the authors discuss the feasibility of these strategies.

Gordon Lee (Program Chair, CATA 2018) and Les Miller (General Chair, CATA 2018)

ISCA Copyright© 2018 104 IJCA, Vol. 25, No. 2, Sept. 2018

Labeling Method Using EEG to Predict Drowsy Driving with Facial Expression Recognition Technology

Daichi Naito*, Ryo Hatano*, and Hiroyuki Nishiyama* Tokyo University of Science, Yamazaki 2641, Noda-shi, CHIBA, 278-8510, JAPAN

Abstract focus on predicting its occurrence. Much of the literature on drowsy driving prediction deals with the driver’s pulse, Accidental death caused by driver negligence is a serious electroencephalogram (EEG), and facial expressions, among problem, and the most common cause of such accidents is other indicators [3]. To predict drowsy driving using an EEG careless driving, in particular, driving while drowsy. Drowsy or pulse, it is necessary to attach a device to the driver’s body, driving can be caused by overwork and may lead to serious and this may disturb their concentration while driving, making accidents. Studies have proposed using an the use of such attachable devices to predict drowsy driving electroencephalogram (EEG), pulse, and facial expressions to impractical or undesirable. predict drowsy driving. Predicting drowsy driving using an We therefore aim to predict drowsy driving with a non-contact EEG or pulse, however, may require the driver to wear approach. To that end, we record drivers’ facial expressions cumbersome devices, which is likely to disturb their using a 3D camera, and predict drowsy driving with machine concentration while driving. We therefore aim to predict learning. However, developing an effective machine learning drowsy driving from a driver's facial expressions in a non- model requires a considerable amount of time and effort, as it is contact manner. In particular, we record facial expressions necessary for humans to watch many videos of drivers to with a 3D camera, and predict drowsy driving using machine categorize (label) the training data. This makes it difficult to learning. This, however, requires visually checking the prepare enough training data to obtain a generalized prediction driver’s state, to label the data, which is time consuming and model, an issue which is probably also similar for other studies makes it difficult to prepare sufficient labeled data for machine that employ an approach similar to ours. To tackle this learning. In this study, we propose an automated method for problem, we propose a new labeling approach, using EEG data labeling, using a machine learning technique based on EEG data. to shorten the time required for labeling, as it is known that an Evaluation of the proposed method is conducted, based on EEG is helpful for determining the level of sleepiness [1, 6]. accuracy and the F-value, for unknown data prediction. We organize the rest of this paper as follows: Section 2 Key Words: Drowsy, driving, machine learning, labeling, summarizes several studies with respect to sleeping and drowsy EEG, facial expressions. driving; Section 3 presents an overview of our method and the principles of machine learning; Section 4 shows the 1 Introduction experimental environment and the analysis of our results; and Section 5 concludes this paper with further remarks. Deaths caused by accidents resulting from driver negligence have become a serious problem in recent years. Figure 1 2 Related Works shows the yearly trend in causes of fatal accidents due to driver negligence in Japan [7]. The Metropolitan Police Department Nishiyama et al. [5] proposed a method to predict drowsy Transportation Bureau reports that the most common cause of driving using drivers’ facial expression data, obtained with a 3D accidental death from driver negligence is careless driving. It camera. They applied inductive logic programming to the data, is therefore seen as being greatly beneficial if the rate of careless and obtained a set of rules that could be interpreted as driving could be reduced. corresponding formulas of first-order logic. The obtained “Careless driving” refers to situations in which the driver is rules involved expressions that allowed for prediction of drowsy unable to concentrate on driving – such as driving when drowsy. driving 10 s before its occurrence. Nevertheless, they faced As drowsy driving can cause significant traffic accidents, we the aforementioned problem of collecting enough training data, ______owing to the time required to label videos. For this reason, * Department of Industrial Administration, Faculty of Science and there was a risk of overfitting such a model, and increasing the Technology. Email: [email protected]. size of the training dataset was viewed as imperative.

ISCA Copyright© 2018 IJCA, Vol. 25, No. 3, Sept. 2018 105

Figure 1: Trend in the number of fatal accidents by law violation

Tarek et al. [2] classified the sleep stage of a person with a (Experiment 2). In both Experiments 1 and 2, we video the support vector machine (SVM) using EEG, electrooculogram driver to verify their driving behavior after the experiments. (EOG), and electromyogram (EMG) data. This study involved The EEGs are recorded by an electroencephalograph, and the classification of whether a subject was sleeping throughout the driver's facial expressions are recorded with a 3D camera. To night, and hence this approach could not be directly applied for make a classification in Phase 1, we use the recorded video to classification of drowsy driving. Moreover, the device used label the state of the driver manually, for the EEG dataset appeared to be too large to attach to subjects, for our purpose. acquired in Experiment 1. In Phase 2, we apply the EEG To predict drowsy driving precisely, using facial expressions dataset acquired in Experiment 2 to the classifier obtained from occurring while driving, a large training set is required. In this Phase 1; the classification results are then used to label the facial study, we propose a labeling method using EEGs to construct expression dataset acquired in Experiment 2. Once we obtain such a training set efficiently. the classifier from Phase 1, we can automatically obtain a labeled dataset using the classifier in Phase 2. 3 A Method for Automatic Data Labeling 3.2 Creating a Dataset 3.1 Overview of our Method In this study, we define labels for two states - awake and Our method is roughly divided into two phases, as shown in drowsy. In general, EEGs are divided into five types at each Figure 2, and in this method, the experiment is performed twice. range of frequencies - , , , , and waves. The Firstly, we record the EEG during driving (Experiment 1), and amount of EEG activity is also thought to change depending on then secondly, we record both the EEG and facial expressions the actions of the brain [8].α β In thisγ δ study, weθ categorize each

106 IJCA, Vol. 25, No. 3, Sept. 2018

Figure 2: Proposed Method Flow

EEG into one of eight types (see also Table 2): , , , 5 s was labeled as either “drowsy”, or “awake”, according to the , , , , and waves. The definition of these types is following conditions: based on the specification of the electroencephalographα1 α2 usedβ1 in thisβ2 studyγ1 γ2 [4].δ Here,θ we define the set E = { , , , , ・ Drowsy: a driver is drowsy for more than 3 s. , , , and } of EEG types for convenience. The ・ Awake: a driver remains awake for more than 5 s. electroencephalograph used in this study can acquireα1 α2 theβ1 powerβ2 spectralγ1 γ2 densityδ θfor the range of each of the above eight types of We excluded instances of drowsy driving lasting 1–2 s, since EEG, every second. the status of a person in this case seemed ambiguous. As we deal with time series data, it is necessary to extract the From each of the acquired EEG types, we extract features of feature quantity as one window every few seconds, and it is also the sliding window defined above. In Table 1, we present all important to be able to determine features in real time. For this features used in this study, where e E, and Per(e) is the reason, we extract the features using a sliding window (the size average percentage of the EEG for 5 s, as defined in Formula of a window is 5 s) that shifts every second. The sliding (1). ∈ window technique is often used for time series data, and shifting the window every second means that prediction can be performed with 5 s of data every second, which allows us to ( ) = (1) construct a rich dataset for the training data. Each window of ( ) 𝟏𝟏 𝟓𝟓 𝒆𝒆𝒕𝒕 𝒕𝒕=𝟏𝟏 𝐏𝐏𝐏𝐏𝐏𝐏 𝐞𝐞 𝟓𝟓 ∑ �∑𝒅𝒅∈𝑬𝑬 𝒅𝒅𝒕𝒕 � Table 1: Features Feature Name Corresponding Formula Description ~f8 Ave(e) Average Value of e E for 5 s f9~ Var(e) Variance of e E for 5 s f1 ∈ ~ f24f16 Max(e) Maximum Value of∈ e E for 5 s ~ f17 Min(e) Minimum Value of e ∈ E for 5 s f33~f40 Per(e) Average Value of Percentage of e E for 5 s f25 f32 ∈

IJCA, Vol. 25, No. 3, Sept. 2018 107

4 Experiment Table 2: EEG types EEG Type Frequency 4.1 Environment ~50 Hz ~40 Hz The subjects taking part in the experiment were four students γ2 41 ~30 Hz in their twenties. They drove for about 5 hours each, with short γ1 31 ~ Hz breaks. For the electroencephalograph, we used NeuroSky's β2 18 ~ Hz Mind Wave Mobile, which can send EEG data to a computer, β1 13 17 via Bluetooth. The EEG frequencies obtained by Mind Wave 8~9 Hz α1 10 12 Mobile were divided into eight types as shown in Table 2. 4~ Hz α2 Figure 3 provides an illustration of a person wearing the Mind ~3 Hz Wave Mobile and the data obtained from it. By attaching the θ 7 electrodes to the forehead and the earlobe, we obtained a power δ 1 spectral density for each type of EEG every second. The We used a driving simulator, so as to observe the subjects’ acquired data was saved as a CSV file, in which rows are the driving without putting them at risk when they became drowsy. time series data, and columns represent different EEG types. Specifically, we reproduced a driving environment by To obtain the driver’s facial expression, we used Intel’s projecting the PlayStation 2 Shutoko Battle (also known as RealSense SR300 as a 3D camera. “” in the US, and “Tokyo Highway Figure 4 shows an example of a facial state captured by the Challenge” in Europe) on a screen wit a projector. To simulate camera, and the 3D coordinates of the acquired points. By the driving environment more precisely, we employed using infrared ray data, the camera outputs the 3D coordinates Logicool’s GT FORCE for the pedals, and a steering wheel, of 78 points on a face at almost 30 fps. Besides these data, we thereby allowing operation of the simulated cars with steering recorded movies with RealSense to judge visually whether wheels and pedals, instead of gamepad controllers. Our subjects were drowsy, for future studies, and we will label these equipment was arranged as shown in Figure 5, and the coordinates with the phase 2 proposed method. We emphasize experiments were conducted in a dark room to encourage that, in this paper, we labelled these coordinates using EEG data sleepiness in the driver. only.

Figure 3: Example of EEG data obtained from Mind Wave Mobile

108 IJCA, Vol. 25, No. 3, Sept. 2018

Figure 4: Example of the 78 points recorded in a facial expression

Figure 5: Experimental environment

IJCA, Vol. 25, No. 3, Sept. 2018 109

4.2 Preliminary Experiment breakdown of this dataset was 1,010 instances of drowsy state, and 1,649 instances of awake state. Figure 6 is a scatter To select an appropriate supervised learning model for diagram of Ave( ) versus Ave( ), calculated from the raw classification of the EEG data, we compared the following data, i.e., the recorded CSV file. As shown in Figure 6, one classifiers: k-nearest neighbors (KNN), SVM, linear can observe a largeα1 difference betweenθ the portion where the discriminant analysis, and a random forest. For SVM, we positive data and the negative data overlap, and the maximum tested both a linear kernel, and a Gaussian kernel, with certain value of the negative data. To capture possible boundaries in parameters. We verified the performance of each trained these data more effectively, we computed the common classifier by 10-fold cross validation, using data from a driver logarithm of the raw data. Figure 7 is a scatter diagram of whose drowsy driving was observed for an hour. The Ave( ) versus Ave( ), generated with common logarithm

α1 θ

Figure 6: Scatter diagram of average values of raw data

Figure 7: Scatter diagram of average values of the common logarithm of the data

110 IJCA, Vol. 25, No. 3, Sept. 2018

values, and it can be seen that this technique allowed the 4.3 Evaluation of our Proposed Method boundary between the positive and negative data to be distinguished more easily. We evaluated the proposed method based on prediction results As usual, we evaluated each learning model by its accuracy for unknown data with respect to the created classifiers. We and F-value, which were defined as follows. created each classifier with a training set consisting of an hour of data for the driver that was used in the preliminary : = (2) experiment, and then evaluated the classifiers based on 𝐓𝐓𝐓𝐓+𝐓𝐓𝐓𝐓 prediction results for two hours of test data from the same driver. 𝐀𝐀𝐀𝐀𝐀𝐀𝐀𝐀𝐀𝐀𝐀𝐀𝐀𝐀𝐀𝐀 𝐓𝐓𝐓𝐓+𝐓𝐓𝐓𝐓+𝐅𝐅𝐅𝐅+𝐅𝐅𝐅𝐅 We used KNN, a Random Forest, and a Gaussian Kernel SVM, : = (3) based on the results of our preliminary experiment. 𝐓𝐓𝐓𝐓 Figure 8 shows the accuracy and F-value for each learning 𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏 𝐓𝐓𝐓𝐓+𝐅𝐅𝐅𝐅 model, illustrating that SVM was best in terms of both accuracy : = (4) and the F-value. Using SVM, we can classify unknown data 𝐓𝐓𝐓𝐓 from the same subject with high accuracy, and a high F-value. 𝐑𝐑𝐑𝐑𝐑𝐑𝐑𝐑𝐑𝐑𝐑𝐑 𝐓𝐓𝐓𝐓+𝐅𝐅𝐅𝐅 KNN achieved the best results in the preliminary experiment, whereas it performed worst in this experiment. In summary, it ・ : = (5) seems best to use the Gaussian Kernel SVM as the classifier for 𝟐𝟐𝟐𝟐𝟐𝟐𝟐𝟐𝟐𝟐𝟐𝟐𝟐𝟐 𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏 our proposed method, with a certain parameter. 𝐅𝐅 − 𝐯𝐯𝐯𝐯𝐯𝐯𝐯𝐯𝐯𝐯 𝐑𝐑𝐑𝐑𝐑𝐑𝐑𝐑𝐑𝐑𝐑𝐑+𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏 In these equations, TP, FP, TN, and FN are defined as follows: 4.4 Verification of the Influence of the Size of the Training Dataset • TP: Total count for which a drowsy state was judged as drowsy We reviewed whether the size of our dataset affected the • FP: Total count for which an awake state was wrongly prediction results. We divided our dataset, which contained judged as drowsy three hours of data, into three equal parts, and used k { , } • TN: Total count for which an awake state was judged as hours as training data, and 3 - k hours as test data for each. awake Table 4 shows the accuracy and F-values for each ∈setting.1 2 • FN: Total count for which a drowsy state was wrongly Based on these results, we can obtain reasonable accuracy and judged as awake. F-values with a single hour of training data, while two hours of training data led to a better outcome, in terms of F-value. The accuracy and F-value of each learning model are shown in Table 3, and from these results, we can observe that KNN, the 5 Conclusion random forest, and the Gaussian kernel SVM were appropriate for classifying our dataset. Results obtained using the In this study we wanted to predict drowsy driving, while common logarithm were also better than those from the raw data. keeping the driving environment realistically comfortable. We therefore used features based on the common logarithm in For this reason, we employed a method to predict drowsy what follows, except for per(e), and the reason that we did not driving using facial expressions only. From a driver's facial use this method for per(e) is because if we computed its expression data acquired by a 3D camera, we are able to predict common logarithm, it became very close to the value for each whether the driver is drowsy or not with machine learning. type of EEG data (cf. Figure 7). Hence, we calculated per(e) However, to classify the data for such purposes, we have to directly from the raw data. visually check the state of the driver – as drowsy or awake - and

Table 3: Results of the preliminary experiment Learning Model KNN Linear Random Gaussian Linear Kernel Discriminant Forest Kernel SVM Analysis SVM Raw Accuracy 89.0% 80.2% 89.4% 85.5% 82.1% Data F-Value 91.6% 66.2% 90.3% 88.3% 84.3%

Common Accuracy 94.4% 87.3% 91.5% 93.1% 89.1% Logarithm Data F-Value 95.3% 85.7% 91.3% 92.6% 85.8% IJCA, Vol. 25, No. 3, Sept. 2018 111

Figure 8: Evaluation results

Table 4: Results for different amounts of training data expressions using EEG data, however in future work, we would Amount of Training Data Accuracy F-Value review correlation between EEGs and facial expressions with One Hour 87.4% 73.7% the intention of confirming that use of facial expression alone Two Hours 90.7% 82.1% would allow prediction of drowsy driving. So, while we need to test our results with more subjects and different age groups, as in this experiment we only used test subjects in their twenties, manually label the dataset, which takes considerable time and our current work indicates that, from the correlation, it might be effort. For this reason, preparing enough training data has possible to estimate brain state – and therefore predict drowsy been a problem, and to resolve this, we proposed a new labeling driving - using detailed analysis of facial expression alone. method, using EEG data. We used the power spectral density for each type of EEG acquired from the electroencephalograph, References and then from each of the acquired EEG types, we extracted features for each 5 s window, created with a sliding window. [1] American Academy of Sleep Medicine, “The AASM In a preliminary experiment, we examined learning models Manual for the Scoring of Sleep and Associated Events: suitable for our method, and found that SVM, KNN, and random Rules”, Terminology and Technical Specifications. forest performed best. In addition, to help clarify our results, Westchester: AASM, 2007. we extracted features using the common logarithm of the raw [2] Tarek Lajnef, Sahbi Chaibi, Perrine Ruby, Pierre- data. Next, we evaluated our method on new data from the Emmanuel Aguera, Jean-Baptiste Eichenlaub, Mounir same subject, and in this evaluation, the Gaussian kernel SVM Samet, Abdennaceur Kachouri, and Karim Jerbi, “Learning achieved the highest accuracy, at 91.6%, and an F-value of Machines and Sleeping Brains: Automatic Sleep Stage 80.1%, indicating that SVM technique was the most suitable for Classification using Decision-Tree Multi-Class Support our method. Vector Machines”, Journal of Neuroscience Methods, Labeling unknown data from the same person seems possible, 250:94-105, 2015. although we have not verified its benefit against other test [3] Boon-Giin Lee, Boon-Leng Lee, and Wan-Young Chung, subjects. In the future, we hope to add drowsy driving data “Wristband-Type Driver Vigilance Monitoring System from other drivers, and confirm that our classifier system works using Smartwatch”, IEEE Sensors Journal, 15(10):5624- as well with them. Moreover, for this study, we labeled facial 5633, 2015.

112 IJCA, Vol. 25, No. 3, Sept. 2018

[4] “Neurosky Support Site; What are the Different EEG Band Ryo Hatano is an Assistant Professor at Frequencies?” http://support.neurosky.com/kb/science/eeg Tokyo University of Science. -band-frequencies, August 2009, (visited: 29th June 2018). He obtained his Ph.D. from Japan [5] Hiroyuki Nishiyama, Yusuke Saito, and Hayato Ohwada, Advanced Institute of Science and “Machine Learning to Detect Drowsy Driving by Inductive Technology in 2017. Logic Programming using a 3D Camera”, Proc. of the 2016 His research interests span both International Symposium on Semiconductor mathematical logic and machine learning Manufacturing Intelligence, ID 42, 6 pp., 2016. for artificial intelligence. Most of his [6] Allan Rechtschaffen and Anthony Kales, A Manual of work has been on developing reasoning Standardized Terminology, Techniques, and Scoring system and linear algebraic semantics in Systems for Sleep Stages of Human Subjects, 1968. dynamic epistemic logic. E-mail address: r- [7] Tokyo Metropolitan Police Department Transit Authority, [email protected]. “About Traffic Death Accident in 2016”, https://www.npa.go.jp/toukei/koutuu48/H28_jiko.pdf, February 2017, (visited: 21st November 2017). [8] Madoka Yamazaki and Masato Sugiura, “The EEG of Adults and Elderly People”, (in Japanese), Clinical Hiroyuki Nishiyama is a Professor at Neurophysiology, 42(6):387-392, 2014. Tokyo University of Science. He obtained his Ph.D. from Tokyo University of Science in 2000. He is interested in the study of designing human-machine cooperative system, Daichi Naito is a master’s student at user interface, multi-agent system and Tokyo University of Science. He network security based on Artificial received a B.S. from Tokyo University Intelligence. He is a member of IEEE of Science in 2017. He majors in and ACM. His e-mail address is Industrial Administration and is [email protected]. interested in data science and machine learning. His e-mail address is [email protected].

IJCA, Vol. 25, No. 3, Sept. 2018 113

LMS Performance Issues: A Case Study of D2L

Sourish Roy*, Carey Williamson*, and Rachel McLean* University of Calgary, Calgary, AB, CANADA T2N 1N4

Abstract institutions. Commercial LMS solutions today include Blackboard, Canvas, D2L, and others, as well as open-source In this article, we present a network traffic measurement study solutions such as Moodle. Prior to Internet deployment, most of the Learning Management System (LMS) used by the LMS solutions were closed, proprietary, or custom systems University of Calgary, which is called Desire2Learn (D2L). within individual institutions. The motivation for our study comes from anecdotal reports of One popular LMS is Brightspace by D2L (Desire2Learn), sluggish D2L performance, particularly for file uploads. Using which we refer to as “D2L” in this paper [7]. D2L is a Canadian a combination of active and passive network measurement company headquartered in Kitchener, Ontario, Canada. They techniques, we are able to identify the root causes of the poor have more than 800 employees around the world, including D2L performance. The main issues identified are: (1) excessive Canada, Australia, Brazil, Singapore, UK, and the USA. D2L HTTP redirections in our university’s D2L setup; (2) non- was started by John Baker in 1999 as a system to manage negligible round-trip times (RTTs) to the server hosting the D2L courses and student learning. A particular feature in D2L is the content; and (3) default TCP window size settings that limit the tracking and modeling of student learning activities, and making maximum end-to-end throughput. We discuss these issues, and this information available to instructors. D2L also facilitates the identify potential solutions, including improved protocol uploading, grading, and return of assignments to students with configurations, and the use of Web proxy caching. Our discrete- appropriate feedback. event simulation results indicate that a simple Web proxy cache In 2014, the University of Calgary selected D2L as its new at our university could reduce D2L request traffic by 50-70%, LMS as a replacement for Blackboard. Since then, every which would substantially improve user-perceived D2L instructor and student has been given access to D2L, and more performance. and more courses are now available within D2L. There are tens Key Words: Learning management system (LMS), network of thousands of on-campus users who use D2L each day to traffic measurement, user-perceived performance, web-based create/view course content, record/watch lectures, and systems, TCP, throughput, response time. enter/view grades. These LMS activities generate a lot of traffic on our campus network, using thousands of TCP connections, 1 Introduction from many IP addresses, and multiple heterogeneous devices. Knowledge of the D2L traffic patterns can help us understand A Learning Management System (LMS) is an important part its impact on the learning environment at the University of of the IT infrastructure at many universities to help support their Calgary. Thus, the primary motivation for our work is a educational mandate. LMS technology augments classroom workload characterization study of D2L usage on our campus learning with support for online learning, by providing a single [17]. A secondary motivation for our work is user-perceived repository for all course content, and providing a unified D2L performance. Specifically, anecdotal reports from faculty environment for teaching-learning interactions between faculty and students at the University of Calgary indicate that D2L is and students. It provides instructors with tools to create, “slow”. This problem has existed since 2014, but has not yet manage, and deliver course content, while providing students been resolved. with access to content at their own pace, to support onsite, One underlying reason for the sluggishness of D2L is that the remote, or asynchronous learning. An LMS also allows content is hosted remotely in Ontario, approximately 3200 km instructors to check student participation and engagement in the from Calgary, and thus has a non-negligible network round-trip course curriculum, and assess student performance on time (RTT) for access. Indeed, our network measurements assignments, quizzes, and exams. confirm that this network latency is an important contributing The concept of e-learning has been around for many years, but factor. However, we also find other technical issues with the it is only relatively recently that the deployment of Web-based D2L configuration that hamper its user-perceived performance. LMS software has become ubiquitous at educational For example, file uploads for content producers (i.e., ______faculty/staff) are much slower than file downloads for content * Department of Computer Science, Email: {sourish.roy, cwill, consumers (i.e., students). rarmclea}@ucalgary.ca. An analysis of D2L traffic can provide insights into the

ISCA Copyright© 2018

114 IJCA, Vol. 25, No. 3, Sept. 2018

reasons for its slow performance. In computer networking, In 2001, Almeida, et al. [2] analyzed server log data for traffic measurement and analysis are crucial to the design, educational media servers at two major US universities operation, and maintenance of local and wide-area networks. (University of Wisconsin, and University of California, This area of research [11] is used extensively in both academia Berkeley). Their paper focused on the eTeach system and BIBS and industry. By collecting network traffic measurement data, (Berkeley International Broadcasting System), which delivered we can assess the usage of a network, develop improved high quality media content. Their study provides a benchmark communication protocols, and improve the design of future against which future media server workloads can be compared. networks. For example, one common technique is to use content Newton, et al. [15] conducted a long-term Web traffic caching in a network, to reduce traffic volumes and lower the measurement study to see how this traffic has changed over network latency for popular content. As part of our work, we time. They used the TCP/IP packet headers (1999-2012) in explore the applicability of content caching on campus to reduce packet traces collected on the Internet link for the University of requests for remote D2L content. North Carolina. They performed an in-depth analysis of HTTP The rest of this paper is organized as follows. Section 2 request sizes and responses, identifying growth in the size and provides some background on network traffic measurement, and complexity of Web pages, and increased use of cookies. prior research literature. Section 3 describes the research Two common themes throughout these prior works are the methods used for our work. Section 4 presents the results from importance of knowing the workload characteristics for a given our study, focusing on HTTP redirection, network latency, and network application, and the importance of caching to improve TCP throughput. Section 5 presents more recent measurement the performance of any network-based system. Our paper builds results to further investigate D2L performance, including upon the methods and insights from these prior works, and caching. Section 6 concludes the paper. focuses on the network performance of D2L, as an example of an LMS. We are particularly interested in a workload 2 Background and Related Work characterization of D2L, identifying its performance bottlenecks, and exploring the use of local content caching to Network traffic measurement is a well-established technique improve D2L performance. to analyze network usage and understand network application performance. Over the years, a lot of prior research has focused 3 Methodology on the characterization of Internet traffic, starting from the early 1990’s [9], and continuing to the present [11] (e.g., Web traffic Network traffic measurement refers to a set of well- [3, 6, 13, 15], peer-to-peer file sharing [4], YouTube [10, 12], established techniques to collect and analyze empirical data online social networks [5], Netflix [1, 14]). from an operational network [11, 20]. In general, these Network traffic measurement studies such as these are useful techniques can be classified based on the type of network not only for workload characterization, but also for network monitor used (e.g., hardware or software), where the network troubleshooting, protocol debugging, and performance monitor is placed (e.g., edge or core), and by the data collection evaluation. Below, we discuss several influential prior works approach (e.g., passive or active). In our work, we use a on Web-based and education-oriented systems, which provide combination of passive and active measurements on our edge background context for our own work. network. In 1996, Arlitt, et al. [3] characterized Web server workloads In passive network measurements, data is gathered by by analyzing access logs recorded at Web servers. These logs listening to ambient network traffic, without generating any record requests for Web site URLs, including time of request, additional traffic that might interfere with this flow, or affect the client IP address, content accessed, and document size. They measurements themselves. In our study, we use specialized used 6 different data sets in their study: three from universities, networking hardware to collect information about all campus- two from research organizations, and one from a commercial level traffic on our edge network. Our network monitor records Internet provider. They identified common workload information about the inbound/outbound network traffic passing characteristics, such as Zipf-like object popularities and heavy- through the university’s edge routers. This collection takes tailed file size distributions. In addition to the broad place through a mirrored stream of all packet-level Internet understanding of Web server workloads, this paper also traffic entering/leaving the University of Calgary network. discussed the importance of Web object caching. Based on their Our network monitor is a Dell server, which processes the findings, they proposed improved Web caching systems with mirrored traffic stream. It is equipped with two Intel Xeon E5- frequency-based cache management policies. 2690 CPUs (32 logical cores @2.9 GHz), 64 GB RAM, and 5.5 In 1999, Breslau, et al. [6] conducted a detailed study of Web TB of local hard disk storage for the logs. The operating system caching performance. In particular, they identified the presence (OS) on this server is CentOS 6.6 x64. The monitor utilizes an of power-law structures in Web object referencing, which is Endace DAG 8.1SX for capturing the traffic and filtering it. It often referred to as a Zipf or a Zipf-like distribution. They used was designed for 10 Gbps Ethernet, and uses several mathematical models to relate the Zipf characteristics to the programmable functions in the hardware to boost the effectiveness of Web caching. Their study provides guidelines performance of packet processing. The primary use of the on the achievable performance for Web caches in the presence Endace DAG card is to split the incoming traffic into streams of different workload characteristics. for processing by the Bro logging system.

IJCA, Vol. 25, No. 3, Sept. 2018 115

Bro is an open-source framework for network analysis and HTTPS (for actual D2L interactions), and that there is a strong security [8, 16]. In our work, the Bro logging system monitors correlation between the two types of traffic. all packet-level network activities, and produces connection- Our measurements illustrate the traffic volume, data volume, level logs summarizing all the traffic. Our primary interest is in response time, and throughput for all D2L users during the the connection, HTTP, and SSL logs. The connection logs Winter 2016 semester. When lectures began on January 11, the provide data regarding each observed connection, such as start D2L traffic increased. The D2L traffic pattern varies throughout time, end time, bytes transferred (inbound/outbound data), the semester, with a dip during Reading Week break in duration, and termination state. The HTTP log helps us identify February, and a sharp decline after final exams in April. the source/destination IPs, HTTP methods, hosts, URIs, referer The D2L traffic shows strong daily and weekly patterns, with URLs, and user agents. Finally, the SSL logs show us HTTPS weekday traffic far exceeding that on weekends. On a typical connections, with fields like timestamps, TLS/SSL encryption weekday, we observe about 16,000 HTTP requests to D2L from methods, plus source/destination ports. the University of Calgary network, and about 500,000 HTTPS Bro collects and generates logs on an hourly basis, which we requests to D2L. The 30-fold difference between HTTP and aggregate together to provide a semester-long view of D2L HTTPS indicates that most D2L interactions occur via HTTPS. traffic. We collect and analyze data from the HTTP, SSL, and The HTTP traffic is primarily for session initiation/termination. connection logs to produce the results reported in this paper. In addition to the Endace/Bro data collection described above, 4.2 HTTP Redirection Issue we also use Wireshark [21] to collect packet-level details on several D2L test sessions from our own desktop computers. The first D2L performance issue that we have identified is Wireshark captures packets in real time, and displays them in a related to how D2L is configured to operate within the human-readable format. Using Wireshark, we can explore the University of Calgary IT infrastructure. In particular, session details of D2L interactions for our own test sessions. initiation involves user authentication. This step actually Unlike passive approaches, active measurements generate involves several HTTP redirections to the Central extra packets on the network as part of their data collection Authentication Service (CAS) at the University of Calgary. process. These can be used to measure the time taken to reach These interactions are complex, and add noticeable latency to a target destination, the capacity available for a network path, or the D2L experience. the response time for an application. Since this category of Figure 2 shows a schematic illustration of a D2L test session measurement generates additional traffic, we have performed that we conducted. This session lasts about 10 minutes, and active measurements selectively, using basic active involves several steps. First, the initial attempt to contact D2L measurement tools like ping and traceroute that have minimal via HTTP is redirected to use HTTPS instead. Second, the impact on the network. Where appropriate, we have also request is redirected from D2L to CAS at the University of generated some small test sessions in D2L, in order to assess the Calgary for user authentication. Third, the Web browser uses impacts of different file sizes, browsers, and configuration multiple TCP connections in parallel to load the CAS login page settings. These sessions constitute a very small proportion of (i.e., CSS file, logo, background, Javascript). Fourth, once the the overall D2L traffic observed. user logs in successfully, another HTTPS redirection occurs to re-connect with D2L. The Web browser then launches multiple 4 Measurement Results TCP connections to retrieve the different components of the D2L landing page, including colour template, university logo, 4.1 D2L Traffic Overview menu buttons, and course home page. The user is now ready to begin their D2L session. In this test session, the user browses Figure 1 provides a high-level overview of the D2L traffic several course pages, viewing slides from the course content, observed on our campus network during the Winter 2016 and uploading and downloading a few files. Finally, the user semester (January-April 2016). Note that D2L traffic occurs logs out. over both HTTP (for initiation/termination of D2L sessions) and During logout, another series of HTTP redirections occur.

(a) HTTP requests per day (b) HTTPS requests per day

Figure 1: D2L Traffic Profile for Winter 2016

116 IJCA, Vol. 25, No. 3, Sept. 2018

Figure 2: Example of D2L Browsing Session (IIS 10.0)

The first of these is from D2L to an e-learning server hosted by in characterizing the Internet path between the two. the Taylor Institute for Teaching and Learning (TITL) at the Figure 4 shows the traceroute results, which indicate that the University of Calgary. Next, the Web browser uses parallel TCP connections to load the different components of the session logout page. Finally, there is a superfluous HTTP redirection from the TITL server to itself, to change the URL from “logout” to “logout/”.

4.3 Network Latency Issue

The second D2L performance problem relates to how far away the D2L server is from the University of Calgary. In particular, the network round-trip-time (RTT) is about 40 ms, which is non-negligible. Figure 3 shows how an on-campus user accesses D2L. In this figure, the campus network is enclosed within a triangle, while the D2L hosted service in Ontario is indicated by the oval on the right. We are interested Figure 3: Network Path for D2L Users on Campus

IJCA, Vol. 25, No. 3, Sept. 2018 117

D2L hosted service (i.e., desire2learn.ip4.torontointernet data, we calculated the Average Data Rate (ADR) for D2L data exchange.net) is located at a data center in Toronto. The transfers, which is the size of a transferred file divided by the network path has 17 hops with a total RTT of 37 ms. elapsed time duration. This metric indicates the average A recurring theme in our study is the adverse impact of throughput for D2L connections, in bits per second (bps). network latency on user-perceived performance in D2L. The Figure 5 shows a Log-Log Complementary Distribution performance of D2L is affected by these high RTT values. (LLCD) plot of the ADR from some of our empirical data. The Users spend time waiting for responses from a distant data average ADR is 500 Kbps, with some data points up to 5 Mbps center in Toronto. This hinders the responsiveness of the D2L for inbound connections. A much lower ADR is seen for Web site, particularly when multiple HTTP/HTTPS redirections outbound connections, with the average being 50 Kbps, and a occur. Furthermore, the network bandwidth is not well utilized maximum ADR of around 350 Kbps. Note that these during TCP slow start, and D2L performance suffers. throughput values represent only the average, and not the instantaneous throughput. Specifically, they are calculated from 4.4 TCP Throughput Issue the byte counts and the durations reported in the connection logs, and the duration includes all the TCP connection The third problem with D2L at our university is the TCP handshaking, slow start effects, and any timeouts used for throughput, which is an important factor that affects network persistent connections. application performance. Using our empirical measurement There are two intriguing observations in Figure 5. First, the

$ traceroute d2l.ucalgary.ca traceroute to d2l.ucalgary.ca (199.30.181.42), 30 hops max, 60 byte packets 1 deptNFSgate (172.17.10.1) 0.233 ms 0.217 ms 0.302 ms 2 * * * 3 10.58.48.1 (10.58.48.1) 0.367 ms 0.370 ms 0.363 ms 4 * * * 5 10.16.18.1 (10.16.18.1) 0.433 ms 0.404 ms 0.401 ms 6 10.16.18.4 (10.16.18.4) 0.302 ms 0.246 ms 0.237 ms 7 10.16.17.1 (10.16.17.1) 0.438 ms 0.403 ms 0.432 ms 8 10.59.226.26 (10.59.226.26) 0.334 ms 0.324 ms 0.333 ms 9 h74.gpvpn.ucalgary.ca (136.159.199.74) 3.296 ms 3.333 ms 3.471 ms 10 h66-244-233-17.bigpipeinc.com (66.244.233.17) 0.744 ms 0.633 ms 0.624 ms 11 h208-118-103-166.bigpipeinc.com (208.118.103.166) 0.880 ms 0.869 ms 0.836 ms 12 clgr2rtr2.canarie.ca (199.212.24.66) 0.721 ms 0.755 ms 0.726 ms 13 wnpg1rtr2.canarie.ca (205.189.33.199) 36.400 ms 36.180 ms 36.307 ms 14 canariecds.ip4.torontointernetxchange.net (206.108.34.170) 36.543 ms 36.514 ms 36.368 ms 15 desire2learn.ip4.torontointernetxchange.net (206.108.34.184) 36.668 ms 36.511 ms 36.484 ms 16 * * * 17 ucalgary.desire2learn.com (199.30.181.42) 36.727 ms 36.810 ms 36.770 ms

Figure 4: Traceroute Results for d2l.ucalgary.ca

throughput values are very low (i.e., much lower than expected on CANARIE’s fast national network). Second, the average throughput differs for uploads and downloads, almost by a factor of two. To further investigate this issue, we conducted some D2L test sessions involving uploads and downloads for a single 3.2 MB data file. Table 1 summarizes the results of our experiment, which confirms the asymmetric performance for uploads and downloads. The highest throughput achieved for downloads was 14 Mbps, while that for uploads was 7 Mbps. These results were consistent across all test scenarios considered. We used Wireshark and some active measurements to learn more about the TCP version and settings used by D2L data transfers. Wireshark provides information such as TCP options, maximum segment size (MSS), slow start, window size, Figure 5: LLCD Plot of D2L Throughput sequence number analysis, and others. OS fingerprinting allows

118 IJCA, Vol. 25, No. 3, Sept. 2018

Table 1: TCP Throughput for D2L Transfers (3.2 MB) Scenario Device OS Download Upload On campus, wired Desktop Windows 8 14 Mbps 7 Mbps On campus, wireless Laptop Mac OS X 14 Mbps 7 Mbps Off campus, wireless Laptop Mac OS X 14 Mbps 7 Mbps

us to infer the operating system (Windows 2008 R2) and TCP With mitmproxy in place, we can view the details of our D2L version (Compound TCP [18, 19]) used by the D2L server, test sessions, including HTTP requests/responses, file which is running Microsoft’s Internet Information Server (IIS names/sizes, and response times. These experiments showed version 7.5) that downloads use the GET method, while uploads use the Figure 6 and Figure 7 illustrate the dynamics observed on a POST method. However, the file uploads involve many POST large file upload. Figure 6 shows a TCP sequence number plot requests, each with a small transfer size. Furthermore, D2L of how the data is transferred over time. This graph shows internally updates a file directory structure on uploads, as several irregularities in the data transfer process. Figure 7 indicated by UpdateTreeBrowser in one of its URLs. In illustrates the TCP receiver window size advertised by the D2L addition, there is an activity feed popup right after the new server during a file upload. The first observation is that the content is uploaded into D2L, as a notification for the user. maximum advertised window size is 64 KB, which is the default These activities all increase the delay for D2L file uploads. socket buffer size for Compound TCP. This is a very small window size to use on networks with a large delay-bandwidth 5 Additional Results product, such as our scenario. The second observation is that the advertised window size fluctuates a lot, indicating that the In May 2018, we repeated our D2L traffic measurements to server is slow in processing the arriving data packets. There are see what, if anything, had changed from our earlier a half-dozen occurrences of small windows where the data measurements in Winter 2016. Our additional results focus on transfer is inhibited. There is even a window stall event between D2L server performance, recent upgrades, and potential Web 11 and 12 seconds, where the receiver window is almost zero proxy cache performance. (395 bytes, too small for the uploader to send another MSS). These results demonstrate that the D2L data transfer 5.1 D2L Server performance is window-limited. Even if data transfers were perfect, with 64 KB of data exchanged every 40 ms, the The first observation from our May 2018 measurements is that maximum throughput would be 14 Mbps, which is what we the server-side infrastructure for D2L has changed since Winter observed in our download experiments. A larger window size 2016. In particular, the HTTP response headers now show that of approximately 1 MB would be required to better exploit the the D2L Web server is running IIS 10.0, rather than IIS 7.5. The network path between Calgary and Toronto. server hardware and operating system have also been upgraded. Understanding why upload performance (7 Mbps) is worse This upgrade has improved the performance for D2L file than downloads (14 Mbps) requires even further investigation. uploads, as shown in Figure 8. In this example of a 2.1 MB file To obtain insight into this problem, we conducted our own upload, the TCP sequence number plot shows that the data active measurement experiment on a D2L test session using transfer is completed in about 1.7 seconds, for an average special software called mitmproxy, which acts as a man-in-the- throughput of 14 Mbps. Furthermore, Figure 9 shows that the middle (mitm) proxy. This software intercepts the traffic server has no problem keeping up with the data, since the between a client and a server, and can report all the receiver advertised window for TCP is almost always 64 KB. HTTP/HTTPS traffic requests made by the user. To be specific, the initial advertised window is about 12 KB, but

Figure 6: File Upload with D2L’s IIS 10.0 Server Figure 7: TCP Receive Window for D2L File Upload

IJCA, Vol. 25, No. 3, Sept. 2018 119

Figure 8: TCP Seqnum Plot for D2L File Upload Figure 9: TCP Window during File Upload (IIS 10.00) this window is dynamically resized by TCP until it reaches the pages in a D2L browsing session for a computer science student maximum of 64 KB, and stays at that value for the rest of the registered in the CPSC 413 course on Complexity Theory. The transfer. The good news here is that uploads are now just as fast most striking observation is that the D2L home page, which all as downloads; the bad news is that uploads and downloads are students visit by default upon login, has approximately doubled both still window-limited. The TCP sequence number plot in in size (KB) and complexity (number of objects). This results Figure 8 still shows bursts of 64 KB at a time, with an ensuing in slower Web browsing performance than before the upgrade. wait of about 40 ms for the window’s worth of The other example pages are comparable to or smaller than they acknowledgements to return. As a result, the average were before. throughput never exceeds 14 Mbps, despite the multi-Gbps The second issue relates to content caching. Based on the network path between the client and the server. TCP window analysis of HTTP response headers in Wireshark, most D2L scaling would be required to better utilize this network path. content is now marked as uncacheable (i.e., “private, max- age=0”), unlike the previous version of D2L which allowed 5.2 D2L User Interface caching of static content (i.e., “public, max-age=3600”). Such content was typically cacheable for one hour (3600 seconds). On May 4, 2018, our university launched a new version of The underlying reason for making content uncacheable is not D2L with an improved user interface. The intent of the new clear. We speculate that it is to enable full tracking of student release was to improve the user experience, with a new design learning activities, with all Web requests coming to the origin layout, easier navigation, and mobile-friendly content. We server. Another possibility is that it reflects the new General collected detailed measurements both before and after the new Data Protection Regulation (GDPR) policy enacted by the release, in order to understand the differences. Importantly, this European Union in April 2016, which became effective on May “upgrade” was only to the user interface, and did not address 25, 2018. In simple terms, this privacy regulation states that no any of the fundamental network performance issues identified user data should be stored and used without the consent and earlier. knowledge of the user. However, one drawback of this While the new design is aesthetically pleasing, there are two regulation, if naively applied, is that it could compromise new problems with the D2L deployment at our university. One network performance. problem is that the D2L home page is much larger and more The downside of uncacheable content is that the content must complicated than it was before. Another problem is that D2L be retrieved repeatedly from the origin server far away. The seems to have made almost all content uncacheable. We most glaring example in our experiments was the user profile comment on these two issues next. image (21 KB), which was downloaded 61 times during a 10- Table 2 provides a comparison of D2L Web pages both before minute browsing session. If the content is hosted far away, then and after the user interface upgrade. These are examples of these repeated transfers will affect the user-perceived Web

Table 2: Comparison of D2L Web Page Complexity Type of Before Update After Update Web Page Objects Size Objects Size D2L Home Page 21 638 KB 45 1,239 KB Course Home Page 27 466 KB 11 567 KB Course Content Page 21 219 KB 17 173 KB View Content Page 19 44 KB 16 52 KB

120 IJCA, Vol. 25, No. 3, Sept. 2018

browsing performance. As with our earlier work, we have shared these results with size in Megabytes (MB) on a logarithmic scale, while the the University of Calgary Information Technologies (UCIT) vertical axis shows the Document Hit Ratio (DHR) on a linear team at our university. They are in touch with D2L to find better scale. Higher values of DHR are better, since they represent technical solutions for our D2L performance problems. One more objects being retrieved from the local proxy cache on obvious solution would be to host all D2L content on campus, campus, and fewer requests going all the way to the D2L server. or provide a CDN node on campus that stores this private In general, the results in Figure 10 show very good potential information locally, rather than in the cloud. Another solution for Web proxy caching. Even a modest size cache (10 MB) can might be to ensure that private information is always stored and provide a hit ratio of over 50%, reducing by half the number of accessed in encrypted form. requests to the origin server. With a cache size of 1 GB, the DHR would be approximately 70%. Figure 10 also shows that 5.3 Web Caching there are notable performance differences among the replacement policies used to manage the cache. The best policy Our final result relates to the potential benefits of Web proxy is SIZE, which focuses on caching a large number of small caching to improve D2L performance at our university. We use objects, such as the icons, sprites, and logos used on many of trace-driven simulation for this analysis, with an arbitrarily the D2L pages. The frequency-based policies also perform well, chosen D2L workload trace from February 14, 2018. There with GLFU always outperforming CLFU (as expected). The were approximately 402,000 Web object transfers on that day. LRU policy is next best. FIFO and RAND are poor cache Our discrete-event simulation models a simple Web proxy management policies, as expected. cache on campus, which stores and remembers previously seen The main takeaway message from these experiments is that static Web objects. Since the cache has a finite size, a cache for D2L, a very simple and modest-sized Web proxy cache on replacement policy is used to manage the cache, and remove campus would be highly effective in reducing D2L requests to previous items when more space is needed to store a new item. the remote server. This cache would improve the user-perceived We use this caching simulator to estimate the potential savings latency for accessing D2L content, and reduce repeated transfers in D2L requests with a Web proxy cache. of the same content across the Internet. Furthermore, in addition We consider several different replacement policies to manage to the on-demand caching described above, there should be very the cache. RAND (Random) removes a randomly-chosen object good potential for doing pre-fetching (i.e., prediction) of D2L when more space is needed. FIFO (First In First Out) removes content requests, based on the browsing patterns of students. the oldest object in the cache. LRU (Least Recently Used) Two caveats apply to these Web proxy caching results. First, removes objects that have not been used recently. SIZE these results assume that all D2L objects are static Web content, removes large objects, while keeping small objects. LFU (Least and eligible for caching. This assumption may not hold for Frequently Used) removes objects that have not been used often. dynamic content, or for uncacheable requests that D2L uses to We consider two variants of the latter: In-Cache LFU (CLFU) track student learning behavior. Second, our caching study only tracks the popularity of objects that are in the cache, ignores the issue of end-to-end encryption, which D2L uses to resetting the popularity of an object to zero when it leaves the protect user privacy. Any practical caching solution for D2L, cache, while Global LFU (GLFU) always remembers the such as a Web proxy cache or a CDN node, would have to cumulative popularity of an object, whether it is still in the cache consider these two practical issues. or not. Figure 10 shows the results from our Web proxy cache 6 Conclusions simulations. On this graph, the horizontal axis shows the cache In this paper, we presented an empirical measurement study of the Desire2Learn (D2L) Learning Management System (LMS) adopted for use at the University of Calgary. The motivation for our study was to gain a better understanding of the system configuration, and its performance limitations. While studying an LMS such as D2L is complex, there are three main technical issues that emerge from our study as root causes for the poor performance of D2L. The first issue is the excessive use of HTTP redirection at the University of Calgary to manage login/logout for D2L sessions. The second issue is the network RTT latency for D2L users in Calgary to access course content that is remotely hosted in Ontario. Finally, the TCP configuration on the D2L server has a maximum window size of 64 KB, which limits data transfer throughput. The main conclusion from our study is that D2L is slow, and unnecessarily so. Fortunately, the observed performance Figure 10: Simulation results for web proxy cache problems are all fixable, as follows. First, we observed over one

IJCA, Vol. 25, No. 3, Sept. 2018 121

million HTTP redirects during the Winter 2016 term, which Proceedings of ACM SIGCOMM, Zurich, Switzerland, could be eliminated to minimize network round-trips and reduce pp. 101-112, August 1991. server load. Second, there is a 40 ms RTT latency for University [10] M. Cha, H. Kwak, P. Rodriguez, Y. Ahn, and S. Moon, of Calgary users to access D2L content. Using a content “I Tube, You Tube, Everybody Tubes: Analyzing the delivery network (CDN), or placing a CDN node locally on World’s Largest User-Generated Content Video campus, could greatly accelerate content delivery. Our System”, Proceedings of ACM IMC, San Diego, CA, pp. simulation-based study suggests that a simple Web proxy cache, 1-14, November 2007. properly managed, could reduce D2L content requests by 50- [11] M. Crovella and B. Krishnamurthy, Internet 70%. Finally, the TCP window size used by D2L is small, and Measurement: Infrastructure, Traffic and Applications, does not scale dynamically based on the observed characteristics John Wiley & Sons, 2006. of the network path. An expanded TCP window size would [12] P. Gill, M. Arlitt, Z. Li, and A. Mahanti, “YouTube solve this problem, improving throughput for both uploads and Traffic: A View from the Edge”, Proceedings of ACM downloads. We hope that the insights from our study will IMC, San Diego, CA, pp. 15-28, November 2007. improve future deployments of the D2L LMS, both at our [13] F. Hernandez-Campos, K. Jeffay, and F. Smith, university and elsewhere. “Tracking the Evolution of Web Traffic: 1995-2003”, Proceedings of IEEE MASCOTS, Orlando, FL, pp. 16- Acknowledgements 25, October 2003. [14] M. Laterman, M. Arlitt, and C. Williamson, “A Campus- Financial support for this research was provided by NSERC. Level View of Netflix and Twitch: Characterization and The authors thank D’Arcy Norman at TITL for his detailed Performance Implications”, Proceedings of SCS knowledge of D2L, and the CATA 2018 reviewers for their SPECTS, Seattle, WA, pp. 15-28, July 2017. feedback and suggestions on an earlier version of this paper. [15] B. Newton, K. Jeffay, and J. Aikat, “The Continued Evolution of Web Traffic”, Proceedings of IEEE References MASCOTS, San Francisco, CA, pp. 80-89, August 2013. [16] V. Paxson, “Bro: A System for Detecting Network [1] V. Adhikari, Y. Guo, F. Hao, V. Hilt, Z-L. Zhang, M. Intruders in Real-Time”, Computer Networks, Varvello, and M. Steiner, “Measurement Study of 31(23):2435-2463, December 1999. Netflix, Hulu, and a Tale of Three CDNs”, IEEE/ACM [17] S. Roy, Characterizing D2L Usage at the U of C, MSc Transactions on Networking, 23(6):1984-1997, Thesis, University of Calgary, August 2017. December 2015. [18] K. Tan and J. Song, “Compound TCP: A Scalable and [2] J. Almeida, J. Krueger, D. Eager, and M. Vernon, TCP-friendly Congestion Control for Highspeed “Analysis of Educational Media Server Workloads”, Networks”, Proceedings of 4th International Workshop Proceedings of ACM NOSSDAV, Port Jefferson, NY, pp. on Protocols for Fast Long-Distance Networks, Nara, 21-30, January 2001. Japan, 8 pages, February 2006. [3] M. Arlitt and C. Williamson, “Internet Web Servers: [19] K. Tan, J. Song, Q. Zhang, and M. Sridharan, “A Workload Characterization and Performance Compound TCP Approach for High-Speed and Long- Implications”, IEEE/ACM Trans. on Networking, Distance Networks”, Proceedings of IEEE INFOCOM, 5(5):631-645, Oct. 1997. Barcelona, Spain, pp. 1217-1228, April 2009. [4] N. Basher, A. Mahanti, A. Mahanti, C. Williamson, and [20] C. Williamson, “Internet Traffic Measurement”, IEEE M. Arlitt, “A Comparative Analysis of Web and Peer-to- Internet Computing, 5(6):70-74, November/December Peer Traffic”, Proceedings of WWW, Beijing, China, pp. 2001. 287-296, April 2008. [21] Wireshark. www.wireshark.org, Jan 2018. [5] F. Benevenuto, T. Rodrigues, M. Cha, and V. Almeida, “Characterizing User Behavior in Online Social Networks”, Proceedings of ACM IMC, Chicago, IL, pp. 49-62, November 2009. [6] L. Breslau, P. Cao, L. Fan, G. Phillips, and S. Shenker, Sourish Roy completed his B.Tech. in “Web Caching and Zipf-like Distributions: Evidence and Computer Science and Engineering from Implications”, Proceedings of IEEE INFOCOM, New St. Thomas College of Engineering and York, NY, pp. 126-134, March 1999. Technology in Kolkata, India, and his [7] Brightspace By D2L, The Brightspace Cloud Content M.Sc. in Computer Science from the Delivery Network. https://community. University of Calgary in Calgary, Alberta, brightspace.com/resources, Aug 2017. Canada. His interests include computer [8] Bro, The Bro Network Security networks, Software Defined Networking, Monitor,https://www.bro.org, Jan 2018. and big data analytics. He is currently [9] R. Caceres, P. Danzig, S. Jamin, and D. Mitzel, employed as a Big Data Engineer at Index “Characteristics of Wide-area TCP/IP Conversations”, Exchange in Toronto, Ontario, Canada.

122 IJCA, Vol. 25, No. 3, Sept. 2018

Carey Williamson is a Professor in the Rachel McLean is an undergraduate Department of Computer Science at the student at the University of Calgary. She University of Calgary. He holds a BSc is in her final year of study towards a BSc (Honours) in Computer Science from the in Computer Science with a University of Saskatchewan, and a PhD concentration in game design. In Spring in Computer Science from Stanford 2018, she received a Canadian NSERC University. His research interests include Undergraduate Student Research Award, computer networks, internet protocols, allowing her to participate in network network traffic measurement, simulation, research under Dr. Carey Williamson. and Web performance.

IJCA, Vol. 25, No. 3, Sept. 2018 123

An Evaluation of the Performance of Join Core and Join Indices Query Processing Methods

* * Reham M. Almutairi , Mohammed Hamdi Southern Illinois University, Carbondale, IL, USA

† Feng Yu Youngstown State University, Youngstown, OH, USA

* Wen-Chi Hou Southern Illinois University, Carbondale, IL, USA

Abstract The objective of this paper is to compare the performance of two of the most efficient join algorithms: Join Core and Join Over the last few decades, much research has been devoted indices. Join indices [3, 4, 15, 18, 24] use indices to perform to designing and evaluating efficient join algorithms. The join operations. It has been reported that Join indices had focus of this paper is a comparison of the two fastest join better performance than traditional join methods. On the other methods: Join indices and Join Core. Join indices generate hand, Join Core [8] had also reported superior performance to index tables that contain tuples identifiers for matching tuples. traditional systems, e.g., MySQL, as it can answer a wide variety of queries without performing joins, including equi-, Joins can be performed by scanning each input relation only outer-, and anti-joins. In this paper, we shall conduct extensive once. On the other hand, Join Core is a data structure that experiments to compare the performance of these two join stores join relationships to facilitate join query processing. methods. With Join Core, join queries can be answered without having The rest of the paper is organized as follows. In Section 2, to perform costly join operations. Both methods have been we give a review of the traditional join algorithms. Section 3 implemented and we have performed extensive experiments on explains in detail the multi-way Jive-join using Join Indices. TPC-H benchmark datasets and queries. Results show that Join Core is described in Section 4. Experimental results are while both methods are much faster than conventional reported in Section 5. Section 6 is the conclusions and the systems, such as MySQL, the Join Core approach is the fastest future work. query processing method for the datasets studied. 2 Traditional Join Algorithms Key Words: Query processing, join queries, equi-join, join indices, decision support systems. Traditional join algorithms generally can be categorized into three types: hash-based joins, sort-merge joins, and nested- 1 Introduction loop joins. Here, we briefly review these methods:

A database is a gathering of data, depicting the activities of 2.1 Nested-Loop Joins at least one related association. Questions are posted by normal customers and applications constantly. Reacting to the The nested-loop join is the simplest among the three asked for data from an enormous measure of information is a traditional joins. Two relations that participate in the join fundamental piece of database management systems [19]. operation are named: the outer relation (or the source relation) The join operations consolidate data from no less than two and the inner relation (or the target relation). Nested-loop relations [16], and are seemingly the most important joins algorithms are one-and-half passes because for each operations in query processing. Since a client is holding up for variation, it scans each tuple in the outer relation once and for the database to respond with an answer, the time for delivering the inner relation, it is repeatedly scanned. Nested-loop joins results is exceptionally critical. Consequently, much research does not require that relations fit into memory, any size of a has been dedicated to designing efficient join algorithms over relation can be used in these joins [6]. Nested-loop join is the most recent couple of decades [1-2, 6-7, 11, 13-14, 21-22, most appropriate for smaller relations. 17]. ______2.2 Sort-Merge Joins * Department of Computer Science. Email: [email protected], {mhamdi, hou}@cs.siu.edu. The fundamental idea of the sort-merge join is sorting both †Department of Computer Science and Information Systems. Email: relations on the join attribute using one of many sorting [email protected]. methods (e.g., n-way merge), and then searching for qualifying

ISCA Copyright© 2018 124 IJCA, Vol. 25, No. 3, Sept. 2018 tuples from both relations by basically combining two Figure 2 shows an example of a database with the join graph relations [5]. shown in Figure 1. The edge between tuples, e.g., S and W,

indicates that S and W have the same join attribute values for 2.3 Hash-Based Joins the join 4. The main idea behind the hash-based (partition) joins is to use hashing to partition the relations and match tuples in

respective partitions. 4 Tuples in the outer relation, called the build relation, is S W

5 partitioned into smaller files so that each partition can fit into

B Q L memory. Tuples in the inner relation, called the probe relation, is partitioned using the same hash function. Then, N J T each partition of the build relation is read into memory to build a hash table for tuples in the corresponding partition of the 1 2 3 R R R probe relation to find matches. Hash-based joins are generally very efficient in traditional methods [20]. Figure 2: Matching of join attributes

3 Join Core

This section we briefly describe the Join Core technique [8]. [8] introduces the concept of trivial (equi-)joins with join predicates: Ri.key = Ri.key, 1 ≤ i ≤ 3. Consequently, each 3.1 Basis tuple in relation Ri would automatically satisfy the trivial join i. Notice that all join edges in the join graph are non-trivial Join Core is a data structure created to facilitate complex joins (or regular equi-joins), e.g. 4, 5. Join 1, 2, and 3 are join queries processing. It is a gathering of equi -join reserved for trivial joins, which are not displayed in Figure 1. relationships of tuples. The join relationships are stored in a set of tuples based on the equi-joins they satisfy. It intends to Maximally Extended Match Tuple. Given a database D = answer join queries without the need to perform expensive {R1, …, Rn} and its join graph G, an extended match tuple (tk, joins. It has been shown that it can answer a wide variety of …, tl), where 1≤ k, ..., l ≤ n, tk Rk, …, tl Rl, and Rk, …,Rl join queries that include arbitrary (legitimate) combinations of are all distinct relations, represents a set of tuples {tk, …, tl} ∈ ∈ equi-, semi-, outer, and anti-joins. that generates a result tuple in {tk} … {tl}. A maximally extended match tuple (tk, …, tl), is an extended match tuple if 3.2 Structure no tuple tm in Rm ( {Rk, …, Rl}) matches⋈ ⋈ any of the tuples tk, …, tl in join attribute values. A join graph is commonly used to describe the equi-join ∉ Example. The join relationships of tuples are captured in the relationships between pairs of relations. maximally extended match format [8], namely, (S, W), (B), (J, Join Graph of a Database. Let D be a database with n Q, L), (T, N). These join relationships are stored in different relations R1 , R2 , …, Rn , and G (V, E) be the join graph of D, tables based on joins, both trivial and non -trivial ones, they where V is a set of nodes that represents the set of relations in satisfy. For example, (S, W) satisfies join 4 and trivial joins D, i.e., V = {R1, R2, R3, ..., Rn}, and E = {(Ri, Rj) | Ri, Rj V, 1 and 2. Therefore, it is stored in a table called J1,2,4. B is a ij}, is a set of edges, in which each represents an equi-join tuple in R1, but it does not satisfy any non-trivial join, and relationship that has been defined between Ri and Rj, i ≠ j. ∈ thus is stored in J1. These tables form the Join Core, as shown in Figure 3. Figure 1 shows the join graph of a database. Relations are connected by the (equi-)join edges. Each edge, e.g., , specifies the equality requirements for a pair of tuples from the 1 2 1 two relations, R1 and R2, must satisfy to generate a result tuple S W B in the equi-join. Each join or edge is assigned a number as its ID. For example, the join between R1 and R2 or is given an ID 4. J1,2,4 J1

1 2 3 2 3

R3 J Q L T N R1

5 J1,2,3,4,5 J2,3,5 4 R2 Figure 1: Join graph Figure 3: Join core structure

IJCA, Vol. 25, No. 3, Sept. 2018 125

3.3 Join Core Construction

Student Activity Activity Code Student Activity

The Join Core can be constructed easily by first

Smith Swimming Swimming 084 tuple-id tuple-id consecutively performing an outer -join for each join edge in the join graph, and then assigning the resulting tuples to Join Bloggs Golf Squash 182 1 1

Core tables based on the joins, both trivial and non-trivial

John Squash Tennis 219 2 4 ones, they have satisfied.

Davis Football Golf 100 3 2

3.4 Answering Join Queries

Mark Tennis 5 3 2 R : Relation Activity Answering a join query is to look for tables whose names 1 R : Relation Student Join Index (J) contain the join edge specified in the query without having to perform joins. For example, to answer a join query: R2 R3,

Figure 4: Database the algorithm looks for join core tables whose names contain

the numbers 2, 3 and 5. Only tables J 1,2,3,4,5 and J2,3,5 meet

(a) Id < 3 (b) Id >= 3 the search criteria, and they are returned as answers, after

Smith 1 Bloggs 4 some simple processing, such as removing unwanted

John 2 Mark 3 attributes. Consequently, join queries can be answered rapidly. 1 1 It has been shown that Join Core can be directly applied to Temp (a) JR (a) Temp (b) JR (b) queries with arbitrary legitimate combinations of equi-, semi-, outer- and anti-joins. Figure 5: Partitioning J

4 Jive-Join the attribute for R1 to the output file buffer for the partition, and write R2 tuple-ids to a corresponding temporary file buffer This section presents the Jive-join technology. It explains for the partition, as shown in Figure 5. the terminology and the basic structure of Jive-join algorithm. In step 3, for each partition, the algorithm reads in and sorts 4.1 Basis the temporary file with duplicates eliminated, by keeping the original version of the temporary file. Next, it scans the In this algorithm, only one sequential passes through each relation R2 by reading only tuples that are mentioned in the input relation, one sequential passes through the join index, temporary file, and retrieves tuples in order. After that, tuples and two sequential passes through a temporary file that its size are written to the output file JR2 in the order of the original is half of the join index size are needed. To reduce the use of temporary files, as shown in Figure 6. main memory, the algorithm stores the intermediate and final results vertically on disks. The Jive-join algorithm helps to join relations that are much larger than the main memory [11]. (a) Id < 3

1 1 Swimming 084 4.2 Structure 2 2 Squash182 In Jive-join method, the join index has a tuple-id column for every input relation. It will be a set of tuple -ids (t1, t2, t3, …, ti) from relations R1, R2, R3,…., Ri. The algorithm divides Swimming 084 relations R2,…Rr into ranges. Then, Ri tuple-ids will be Squash 182 divided into K i ranges, for r=2, …, i. Every tuple (t2, …, tr) JR2 (a) from tuple -ids can belong to one of Y=K2K3…Kr partitions. After that, the algorithm processes each partition separately Figure 6(a): Output file (JR2): partition (a) [9]. We use an example of a database with two relations, Student and Activity, are shown in Figure 4. (b) Id >= 3 The join index table, namely, J, contains tuple-ids for each 4 3 Tennis 219 relation for only matched tuples. Firstly, the algorithm divides 3 4 Golf 100 the join index table into two partitions as shown in Figure 5. Every partition has an associated output file buffer and associated temporary file buffer. When these buffers are full, each of them will flush its contents to a corresponding output Golf 100 file and temporary file on the disk. Tennis 219 In the second step, the algorithm scans J and R1 sequentially, JR2 (b) and discovers the partition that this tuple belongs to, based on the R2 tuple-ids in J. Then, it executes two operations: write Figure 6(b): Output file (JR2): partition (b)

126 IJCA, Vol. 25, No. 3, Sept. 2018

A vertically partitioned data structure, which is called a In the second step, the algorithm scans J, in Table 1, and R1 transposed file, is used to store the final join result [11]. The sequentially, and discovers the partition that this tuple belongs partitions of each relation are linked together into a single file to, based on the R2, R3, and R4 tuple-ids in J. Then, it executes JRi separately from other partitions of another relation such two operations: write the attribute for R1 to the output file that, the first row in each of the files corresponds to the first buffer for the partition, and write R2, R3, R4 tuple-ids to a join result tuple, the second rows to the second join result corresponding temporary file buffer for the partition [11], as tuple, and so on. Noted that, attributes that are shared by more shown in Figure 9. than one relation are placed arbitrarily in one of the vertical Table 1: Join index(J) partitions, as shown in Figure 7. Student Course Time Room 2 9 8 4

Student Activity Code

3 4 3 5

4 2 2 2 Smith Swimming 084 1 2 5 5 4 1 JR (a) JR (a)

John 6 6 5 6 Squash 182

7 2 2 2 8 3 1 8 Bloggs Golf 100 1 2 JR (b) JR (b)

Mark Tennis 219

Figure 7: Final join result

4.3 Multi-Way Jive-Join

The structure of multi-way Jive-join is not much different from the case of the two joins. In multi-way Jive-join, the join index has a tuple -id column for every input relation. It will be a set of tuple-ids (t1, t2, t3, …, ti) from relations R1, R2, R3,…., Ri. The algorithm divides relations R2,…Rr into ranges. Then, Ri tuple-ids will be divided into Ki ranges, for r=2, …, i. Every tuple (t2, …, tr) from tuple-ides can belong to one of Y=K2K3…Kr partitions. After that, the algorithm processes each partition separately [10]. We use an example of a database with three relations, Student, Course, Instructor and Room, are shown in Figure 8.

Course Instructor Student Course 101 Green Smith1 101 102 Yellow Smith2 109 103 Green Davis1 102 104 White Davis2 105 105 Evans Davis3 106 106 Albert Figure 9: Partitioning J Brown 102 107 Beige Black 103 108 Red Frick 107 After finishing step 2, the part of the output, namely JR1, 109 Grey have been generated. The partitions of JR1 will be linked 1 2 Student (R ) Course (R ) together into a single file, which is JR1, as shown in Figure 10.

Instructor Time Also, as shown in Figure 11, we have been generating several Room Course Green 3:00 C300 105 temporary files that are used in step 3 to generate the other Yellow 10:00 C301 102 parts of the output: JR2, JR3, JR4. White 11:00 C302 108 For each relation R2, R3, and R4, tuple-ids in temporary files Evans 8:00 C303 109 Albert 2:00 for all partitions for this relation are copied into one large C304 104 Red 1:00 1 2 3 2 3 4 C305 106 array, namely H , H , H corresponding to relations R , R , R Grey 12:00 C309 103 respectively. Then each array is sorted in ascending order with rejecting duplicates. Next, it scans each relation and the Instructor (R3) Room (R4) corresponding array, and retrieves tuples in order. After that, the tuples are written to the output files JR2, JR3, JR4 in the Figure 8: Database order of the original temporary files, as shown in Figure 12.

IJCA, Vol. 25, No. 3, Sept. 2018 127

common in R1 and R2, which are Student relation and Course relation respectively, will be placed arbitrarily in only one

vertical file: JR1 or JR2.

Davis1 102 Yellow 102 Yellow C301 Brown 102 Yellow 102 Yellow C301 Davis2 105 Evans 105 Evans C300 Smith2 109 Grey 109 Grey C303 Davis3 106 Albert 106 Albert C305 Jones 104 White 104 White C304 Black 103 Green 103 Green C309

Figure 13: Final Result

5 Experimental Results

In this section, we report the experimental results on the multi- way Jive-join and Join Core by measuring their time and space consumptions. We have implemented multi-way Jive-join Figure 11: Temporary files and Join Core using the JAVA programming language. Many factors, such as the number of CPUs, disks (and types of disks, Figure 10: Partitions of JR1 magnetic or SSD), etc., can affect the performance of query processing. In this preliminary study, we will use only the simplest set up to see how the proposed method alone can 2 2 102 Yellow improve query processing, leaving other performance improving 2 Sorting and 3 103 Green 5 factors to future work. All experiments are performed on a 4 104 White 9 5 laptop computer with 1.80 GHz CPU, 4 GB RAM, and 930 GB Eliminating 105 Evans 6 6 hard drive. duplicates 106 Albert 4 9 3 109 Grey H1 5.1 Datasets H1 JR2

We generate TPC-H benchmark datasets with three sizes

2 1GB, 4GB, and 10GB, respectively. The datasets have eight 3:00 2 1 separate tables. Figure 14 shows the TPC-H Schema. The 10:00 4 Sorting and 2 3 11:00 arrows specify the direction of many-to-one relationships 8 between tables. 5 4 8:00 Eliminating 5 2:00 3 1 duplicates 8 12:00 R8: Region H2 3 H2 JR

R1: Customer R : Nation 2 R2: Orders 7 2 1 C300 Sorting and 1 2 C301 4 4 C303 5 6 Eliminating C304 R4: Part 5 duplicates 6 C305 8 8 C309 R3: Lineitem R6: Supplier H3 H3 JR4 R5: Partsupp

Figure 12: Output files JR2, JR3, and JR4

Figure 14: TPC-H Schema Vertically partitioned data structured, which is called a transposed file, is used to store the final join result [12]. The 5.2 Space Consumption partitions of each relation are linked together into a single file JRi separately from other partitions of another relation. In the way that, the first row in each of the files corresponds to the In the TPC-H benchmark dataset, Lineitem table is the first join result tuple, the second rows correspond to the second largest one while Region is the smallest one. For the 1GB join result tuple, and so on. Figure 13 shows the final output dataset, Line item has 6,001,215 tuples, Orders has 1,500,000 of this join. Noted that, attributes that are shared by more than tuples, and Customer has 150,000 tuples. For the 4GB dataset, one relation are placed arbitrarily in one of the vertical Lineitem table has 23,996,604 tuples, Orders has 6,000,000 partitions. For example, the column for Course number is tuples, and Customer has 600,000 tuples. For 10GB dataset,

128 IJCA, Vol. 25, No. 3, Sept. 2018

Lineitem has 42,297,504 tuples, Orders has 15,000,000 beginning until all result tuples are written to the disk is the tuples,and Customer has 1500,000 tuples. However, in all elapsed time. For each query, we have measured the elapsed three datasets, Nation has 25 tuples and Region has only five time for all the three datasets 1, 4, and 10 GB. tuples. Table 4 shows the query processing time for Multi-way Table 2 shows index tables’ size for each query for our Jive-join and full Join Core tables. The first column identifies datasets. We can notice that even though the size of the the ID of the TPC-H query followed by the relations that dataset is large, the size of index tables are small because the participate in this query. Relations are numbered as shown in index table includes tuples identifiers instead of tuples’ values. the TPC-H dataset graph (Figure 8). The second column is the On the other hand, the full Join Core sizes, without applying elapsed time in Join Core for 1, 4, and 10 GB datasets. The any space reduction methods, are 4, 13.8, and 39.7 GB for the next two columns are Multi-way Jive-join elapsed time and 1, 4, and 10 GB TPC-H datasets, respectively in Table 3. The MySQL elapsed time, respectively, for 1, 4, and 10 GB larger sizes of the Join Cores are mainly due to the replications datasets. The final column is the result tuples for the queries. of tuples of smaller relations. In addition, there are several All the times are measured in seconds. We ignored queries cycles, each of which introduces an additional alias relation in that took more than 4 hours, which equals to 14,400 seconds, the graph. After removing Region, Nation, and Supplier, as showed by –‘s in the table. noted as “Reduced 1”, is shown in Table 3. “Reduced 2”, in As we see in Table 4, the queries processed with MySQL Table 3, is the resulting graph after further removing the took longer time than both Multi-way Jive-join and Join Core Customer relation from “Reduced 1” [8]. Consequently, methods. For example, to process query 16 for 1 GB dataset Multi-way Jive-join method has better memory utilization than using MySQL, it took 107.32 seconds. On the other hand, Join Core. processing the same query took 47.030 seconds with Multi- way Jive-join, and took 0.812 seconds with Join Core, which both are much less than the processing time with MySQL. Table 2: Index tables’ size

1GB 4GB 10GB

Query Dataset Dataset Dataset Table 4: Time consumption 12 4 Q / Q : Multi-way

89.7 MB 118 MB 374 MB 2 3 Join Core MySQL {R , R } Result Jive Join 14 19 Query Q / Q : Elapsed Elapsed Tuples 87.3 MB 121 MB 367 MB Elapsed 3 4 1/4/10GB 1/4/10GB {R , R } 1/4/10GB 4 Q16: {R ,

11 MB 53 MB 102 MB 5.456 48.219 403.21 38,928 5 R } 12: {R2, R3} 22.409 161.532 817.02 155,585 Q3/ Q18:

91.6 MB 394 MB 715 MB 1 2 3 56.023 382.157 2,714 388,058 {R , R , R } ⋈ 10 1 2 Q : {R , R , 0.502 36.453 465.10 1,717

97 MB 406 MB 802 MB 3 7 R , R } 14: {R3, R4} 1.865 203.187 1,422 6,718 2 4 5 Q : {R , R , ⋈ 3.865 477.594 2,203 16,943 24.8 MB 92 MB 135 MB 6 7 8 R , R , R } 0.012 68.093 308.75 200 5 1 2 Q : {R , R , 19: {R3, R4} 0.041 277.704 971.14 864 104.8 MB 472 MB 844 MB 3 6 7 8 R , R , R , R } ⋈ 0.103 604.438 1,705 2,096 Table 3: Join core size [8] 0.397 34.297 336.40 3,040 4: {R2, R3} 1.518 124.547 911.38 11,889 Join Core Datasets 3.625 302.969 2,488 29,447 Size 1GB 4GB 10GB ⋈ 0.812 47.030 107.32 3,795 Full 4 GB 13.8 GB 39.7 GB 16: {R4, R5} 3.005 269.782 446.21 15,208 Reduced 1 2.3 GB 7.1 GB 20.1 GB ⋈ 9.686 721.328 901.74 38,195 Reduced 2 1.7 GB 5.4 GB 15.8 GB 1.579 51.688 7,256 11,620

3: {R1, R2, R3} 8.016 155.563 10,342 45,395 ⋈ 17.455 941.687 - 114,003 0.010 36.234 57.516 6 5.3 Time Consumption 18: {R1, R2, 0.012 154.156 163.65 11 ⋈R3} We have applied test queries that come with TPC-H 0.013 412.047 427.14 22 benchmark datasets. We have removed any grouping, 1.706 36.594 2,816 3,773 10: {R1, R2, 5.667 248.625 5,516 14,800 aggregation, and ordering functions from the test queries so R3, R7} that we can focus only on the join operation. We included ⋈ 14.560 325.484 8,341 36,975 "distinct" for the queries to generate distinct result tuples. 1.890 26.313 691.38 3,162 2: {R4, R5, R6, 7.005 401.875 2,810 12,723 A multi-way jive-join algorithm reads the tables from the R7, R8} disk into main memory to perform the join operation, then ⋈ 18.609 811.406 6,447 31,871 writes the result tuples to the disk. The time from the 1.760 38.031 9,471 15,196 5: {R1, R2, R3, 6.809 218.469 - 60,798 beginning to the time when the first result tuple is written to R6, R7, R8} ⋈ 16.355 398.359 - 152,102 the disk is called the response time, while the time from the

IJCA, Vol. 25, No. 3, Sept. 2018 129

Additionally, from Table 4, we also notice that the query queries than the Join Core. This is because Multi-way Jive- processing time with Multi-way Jive-join is larger than the join requires generating output files and temporary files before time with Join Core algorithm. This is because Multi- way generating the final results. On the other hand, Join Core Jive-join needs to generate temporary files and output files JRi. algorithm does not perform joins or generate intermediate On the other hand, for the Join Core algorithm, there is no results; it also generates instant responses. need to generate any intermediate results or perform expensive join operations. Therefore, queries are answered instantly. 6 Conclusions For example, for process query 2 for 1 GB dataset, it took 26.313 seconds in Multi- way Jive-join, but in the Join Core, it In this paper, we have implemented a version of Join took only 1.890 seconds. Indices, which is the Multi-way Jive-join, to contrast its As observed from Table 4, the query processing time is performance with the most recent join algorithm: Join Core. generally increased as the size of datasets are increased for all Join indices produce index tables that contain tuples identifiers the methods. For example, for query 19, it took 68.093, for coordinated tuples. It scans each input relation just once, 277.704, and 604.438 seconds for the Multi-way Jive-join for the join index once, and scans temporary files twice. on the 1, 4, and 10 GB datasets, respectively. This is because the other hand, Join Core is a data structure created to facilitate larger datasets have a larger number of partitions, thus, complex join queries promptly. With the Join Core, join generating larger output files and temporary files. Thus, larger queries can be dealt with rapidly without performing costly datasets require more time to read and write. join operations. Also, no intermediate results should be made It is pointed out that the sizes of the query results determine or recovered during the process. Therefore, join queries can the query processing time for the Join Core method. For be handled quickly. We have produced TPC-H benchmark example, consider query 4 and query 18. Query 4 has one join datasets with three sizes 1GB, 4GB, and 10GB, as an and generates a larger number of result tuples. On the other experiment. Our test result demonstrates that Multi-way Jive- hand, query 18 has two joins, including the join of the query 4, join method has better memory usage over Join Core. but generates a smaller number of result tuples. Consequently, Moreover, processing queries with Multi-way Jive -join is the processing time for query 4 is larger than the processing quicker than with MySQL. In any case, our results time for query 18. As shown in Table 4, to process query 4, it demonstrate that Multi-way Jive-join method has longer took 0.397, 1.518, and 3.635 seconds for 1, 4, and 10 GB processing time than Join Core technique. datasets, respectively, but it took 0.010, 0.012, and 0.013 seconds for 1, 4, 10 GB datasets respectively to process Query References 18. On the other hand, for Multi-way Jive -join, the complexity, [1] Mihaela A Bornea, Vasilis Vassalos, Yannis Kotidis, the number of joins, of the query largely determines the query and Antonios Deligiannakis, “Double Index Nested- processing time, as illustrated by the elapsed time of queries 4 Loop Reactive Join for Result Rate Optimization”, and 18. Even though the query 4 generates a larger number of IEEE 25th International Conference on Data results, it took less time than query 18 since it has only one Engineering, ICDE'09, IEEE, pp. 481-492, 2009. join. Query 18 generates a smaller number of results, but it [2] Shumo Chu, Magdalena Balazinska, and Dan Suciu, took longer to complete because it has two joins. As shown in “From Theory to Practice: Efficient Join Query the table, it took 34.297, 124.547, and 302.969 seconds to Evaluation in a Parallel Database System”, Proceedings process the query 4 for 1, 4, and 10 GB datasets, respectively, of the 2015 ACM SIGMOD International Conference on and it took 36.234, 154.156, and 412.047 seconds to process Management of Data, ACM, pp. 63-78, 2015. query 18 for 1, 4, and 10 GB datasets, respectively. Hence, the [3] Joseph Vinish D’silva, Bettina Kemme, Richard complexity of the query, not the result size, determines the Grondin, and Evgueni Fadeitchev, “Powering Archive th query processing time in Multi-way Jive-join. Store Query Processing via Join Indices”, Proc. 20 Our experimental results show that Multi-way Jive-join has International Conference on Extending Data Base less space consumption than Join Core. Mostly, the main Technology (EDBT), pp. 644-655, March 21-24, 2017. memory is used by more than one task. So, by consuming less [4] Bipin C Desai, “Performance of a Composite Attribute memory for the join operation in Multi-way Jive-join, it helps and Join Index,” IEEE Transactions on Software other tasks to use the main memory at the same time. Also, Engineering, 15(2):142-152, 1989. our result shows that processing query with Multi-way Jive- [5] Jost Enderle, Matthias Hampel, and Thomas Seidl, join algorithm is faster than with MySQL. However, Multi- “Joining Interval Data in Relational Databases,” way Jive-join takes a longer time to process queries than Join Proceedings of the 2004 ACM SIGMOD International Core strategy. This is because Multi -way Jive-join requires Conference on Management of Data, ACM, pp. 683- generating output files and temporary files before generating 694, 2004. the final results. On the other hand, Join Core algorithm has [6] Goetz Graefe, “New Algorithms for Join and Grouping instant response. Operations,” Computer Science-Research and In addition, our result shows that processing query with DevelopmentI, 27(1):3-27, 2012. Multi-way Jive-join algorithm is faster than with MySQL. [7] Gunasekaran Hemalatha and ThanushkodiKeppana However, Multi-way Jive-join takes a longer time to process Gowder, “New Bucket Join Algorithm for Faster Join 130 IJCA, Vol. 25, No. 3, Sept. 2018

Query Results.,” International Arab Journal of [21] Dong Keun Shin and Arnold Charles Meltzer, “A New Information Technology (IAJIT), 12(6A):701-707, Join Algorithm,” ACM SIGMOD Record, 23(4): 13-20, 2015. 1994. [8] Mohammed Hamdi, Feng Yu, Sarah Alswedani, and [22] TPC-H http://www.tpc.org:80/information/bench marks Wen-Chi Hou, “Storing Join Relationships for Fast Join .asp, 2015. Query Processing,” International Conference on [23] Patrick Valduriez, “Join Indices,” ACM Transactions on Database and Expert Systems Applications, Springer, Database Systems (TODS), 12(2): 218-246, 1987. Cham, pp. 167-177, 2017. [9] Ralph Kimball and Kevin Strehlo, “Why Decision Support Fails and How to Fix It,” Acm Sigmod Record,

24(3):92-97, 1995. [10] Ramon Lawrence, “Early Hash Join: A Configurable Reham Almutairi received her B.S in Algorithm for the Efficient and Early Production of Join 2011 from University of Dammam in Results,” Proceedings of the 31st International Saudi Arabia. In 2014, Reham got a Conference on Very Large Data Bases, VLDB scholarship from her government to Endowment, pp. 841-852, 2005. study in the US. In 2017, she received [11] Zhe Li and Kenneth A. Ross, ‘Fast Joins using Join her M.S. from Southern Illinois Indices,” The VLDB Journal—The International University in Carbondale. Studying in a Journal on Very Large Data Bases, 8(1):1-24, 1999. different country with students from [12] Stefan Manegold, Peter A. Boncz, and Martin L. different cultures taught her that people Kersten, “What Happens During a Join? Dissecting have to be confident and continue to CPU and Memory Optimization Effects,” Proceedings overcome their setbacks to adapt to a new culture! Her main of the 26th International Conference on Very Large research interests are databases, data mining, and big data. She Data Bases, Morgan Kaufmann Publishers Inc., pp. can be contacted at [email protected]. 339-350, 2000.

[13] Stefan Manegold, Peter Boncz, Niels Nes, and Martin

Kersten, “Cache-Conscious Radix-Decluster Mohammed Hamdi is an Assistant Projections,” Proceedings of the Thirtieth International Professor in the Department of Conference on Very Large Data BasesI, VLDB Computer Science at Najran Endowment, 30:684-695, 2004. [14] Mohamed F. Mokbel, Ming Lu, and Walid G. Aref, University- KSA. He received his “Hash-Merge Join: A Non-Blocking Join Algorithm for master’s & PhD degrees in Computer Producing Fast and Early Join Results,” Proceedings Science from Southern Illinois 20th International Conference on Data Engineering, University-Carbondale in 2013 and IEEE, pp. 251-262, 2004. 2018. His main research interests are [15] Hung Q Ngo, Ely Porat, Christopher Ré, and Atri databases, query optimization, data Rudra, “Worst-Case Optimal Join Algorithms,” mining, and big data. Proceedings of the 31st ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, ACM, pp. 37-48, 2012. [16] Hung Q. Ngo, Christopher Ré, and Atri Rudra, “Skew Strikes Back: New Developments in the Theory of Join Feng “George” Yu is an Associate Algorithms,” ACM SIGMOD Record, 42(4):5-16, 2014. Professor of Computer Science and Information Systems at Youngstown [17] Patrick O'Neil and Goetz Graefe, “Multi-Table Joins through Bitmapped Join Indices,” ACM SIGMOD State University. He conducts a research lab called Data Lab focusing on Data- Record, 24(3):8-11, 1995. [18] Raghu Ramakrishnan and Johannes Gehrke, Database Oriented Sciences. His primary research rd interests include database management Management Systems, (3 ed). McGraw Hill, 2003. [19] Michael L. Rupley Jr., Introduction to Query systems, big data, and cloud computing. Processing and Optimization, technical_reports/TR- He developed a new storage framework 20080105-1.pdf, Indiana University, 2008, International to speed up the data processing on a next-generation database called column-store database. He leads multiple cloud Journal of Computer Applications, 2011. [20] Ronald Sam Lightstone, Guy Lohman, Ippokratis computing projects at YSU including the STEM Cloud. As a Pandis, Vijayshankar Raman, Richard Sidle, G. campus champion of NSF XSEDE, a powerful collection of Attaluri, Naresh Chainani, Barber, and David Sharpe, supercomputing centers, Dr. Yu has organized at YSU multiple “Memory-Efficient Hash Joins,” Proceedings of the nationwide educational workshops on high-performance VLDB Endowment, 8(4):353-364, 2014. computing and big data.

IJCA, Vol. 25, No. 3, Sept. 2018 131

Wen-Chi Hou received the MS and PhD degrees in Computer Science and Engineering from Case Western Reserve University, 1989, respect- tively. He is presently a Professor of Computer Science at Southern Illinois University at Carbondale. His interests include statistical databases, query optimization, concurrency control, XML databases 132 IJCA, Vol. 25, No. 3, Sept. 2018

Novel Low Latency Load Shared Multicore Multicasting Schemes – An Extension to Core Migration

Bidyut Gupta*, Ashraf Alyanbaawi* Southern Illinois University, Carbondale, USA

Nick Rahimi* Southeast Missouri State University, Cape Girardeau, USA

Koushik Sinha* Southern Illinois University, Carbondale, USA

Ziping Liu* Southeast Missouri State University, Cape Girardeau, USA

Abstract tree-based approaches such as CBT [2, 17] and protocol independent multicasting – sparse mode (PIM-SM) [8] offer an Shared tree multicast approach is preferred over source-based improvement in scalability by a factor of the number of active tree multicast because in the latter construction of minimum cost sources. tree per source is needed unlike a single shared tree in the former In the core-based tree/shared tree approach [2] the tree approach. However, in shared tree approach a single core is branches are made up of other routers, the so-called non-core needed to handle the entire traffic load resulting in degraded routers, which form a shortest path between a member-host’s multicast performance. In addition, it also suffers from ‘single directly attached router and the core. A core need not be point failure’. In this paper, we have presented four novel and topologically centered, because multicasts vary in nature and the efficient schemes for load shared multicore multicasting for geographical locations of the receivers (member-hosts) can be static networks. While the first two schemes emphasize load anywhere in the Internet; therefore, the structure of a core-based sharing, the last two schemes aim at achieving low latency tree can vary [2] as well. CBT is attractive compared to source- multicasting along with load sharing for delay sensitive based tree because of its key architectural features like scaling, multicast applications. We have presented a unique approach tree creation, and unicast routing separation. The major for core migration, which uses two very important parameters, concerns of shared tree approach are: core selection and core as namely, depth of a core tree and pseudo diameter of a core. One a single point failure. Core selection [2, 10] is the problem of noteworthy point from the viewpoint of fault tolerance is that appropriate placement of a core or cores in a network for the the degree of fault- tolerance can be enhanced from covering purpose of improving the performance of the tree(s) constructed single point-failure to any number of core failures. around these core(s). Key Words: Core selection, pseudo diameter, multicore In static networks, any core selection scheme depends on the multicasting, load sharing, core migration. knowledge of the entire network topology. It involves all routers in the network. There exist several important works [12, 1 Introduction 21, 24-25] which take-into-account network topology while selecting a core. Maximum Path Count (MPC) core selection Multicast communication in the Internet [3] uses either the method [12] needs to know complete topology to calculate the idea of source-based trees [1, 21, 26] or the idea of core-based shortest paths for all pairs. The nodes are then sorted in trees (CBT) [2]. One major problem of the source-based-tree descending order of their path counts. The first nodes are multicasting approach is that each router in the tree has to keep selected to be the candidate cores. In Delay Variant Multicast the pair information (source, group) and it is a one tree per Algorithm (DVMA) it is assumed that the complete topology is source. In reality, the Internet is a complex heterogeneous available at each node [21]. It works on the principle of k- environment and it potentially has to support many thousands of shortest paths to the group of destination nodes concerned. If simultaneously active multicast groups, the majority of which these paths do not satisfy a delay constraint, then it may find a are usually sparsely distributed. Therefore, this technique longer path, which is a shortfall of DVMA. Optimal Cost Based clearly does not scale well. It has been observed that shared Tree (OCBT) [23-25] approach calculates the cost of the tree rooted at each router in the network and selects the one which ______gives the lowest maximum delay over all other roots with lowest * Department of Computer Science. cost. It needs knowledge of the whole topology.

ISCA Copyright© 2018 IJCA, Vol. 25, No. 3, Sept. 2018 133

1.1 Related Works applications; these works do not consider load sharing among multiple cores, which is a problem of immense importance in In the case of a single core-tree based multicast, the core has multicasting particularly when considering simultaneous to handle all traffic load. It degrades the performance. To presence of many different multicast sessions. overcome the problem, shared tree multicast using multiple cores is the only solution. It distributes the total traffic load on 1.2 Our Contribution the cores resulting in improved load balancing and thereby causing improved multicast performance. There exist in the In this paper, we have considered shared tree multicast with literature some important contributions in this area of multicore- multiple cores. The motivation of the work is to improve based multicasting [11, 23-25, 27]. The goal of these works is multicast performance via load sharing. We prefer the term to achieve load balancing. In [11], Jia et al. have presented a ‘sharing’ to ‘balancing’ because the objective of the work is to Multiple Shared-Trees (MST) based approach to multicast engage all cores whenever possible and more so because it is routing. In their approach, the tree roots are formed into an neither possible to know a priori the duration of different anycast group so that the multicast packets can be anycast to the multicast sessions running at a given time, nor it is possible to nearest node at one of the shared trees. However, load balancing know a priori the number of possible senders per multicast is at the level of each source choosing the best core closest to it group. In this paper, we have adopted the multiple core- rather than attempting to utilize all the candidate cores selection scheme reported in [7] to design the proposed various simultaneously. This may lead to congestion in a core if load-sharing algorithms. In this context, note that finding an multiple sources choose that core based on their shortest delay optimal placement for multiple cores is more complex than for metric. a single core. This problem can be viewed as given the In [23], a unique tree consisting of multiple cores is maximum distance d from a node to a core, determines the maintained with one of the cores being the root. The objective smallest number of cores that satisfy this criterion. This of the work is to develop a loop free multi-core based tree by minimum d-dominating set problem is NP-hard [6]. To avoid assigning level numbers to the cores and the nodes to join the such complexity in the core selection process, in [7] a practical tree to help maintain the tree structure. The cores need to approach for multiple core selection has been considered with coordinate with one another for their operations. complexity O(n2). Zappala et.al. [27] have considered two different approaches A brief sketch of the proposed work is as follows. The core for multicore load shared multicasting. The first one is senders- selection process in [20] is different from the ones in [5] and to-many scheme; it partitions the receivers of a group among the [27] and uses the idea of pseudo diameter [13-15]. Cores are trees rooted at different cores so that each receiver is exactly on determined statically using the routing information of all routers one core tree at a time. Therefore, one core tree spans some of in the network. It is independent of any underlying unicast the group members only. Even though it offers less routing protocol active in the domain. The metric used in the state, yet it has the complex task to take care of newly arriving determination is the one used in the unicast routing tables of the group members, i.e., partition them appropriately to the different routers. The best core, i.e., the core with the minimum pseudo core trees. In their second scheme, each core tree spans over the diameter is the primary core. We have proposed four novel load entire receiver group. To transmit data, different senders in a sharing schemes and a core migration scheme. The first two multicast group can use different trees with respect to the load sharing schemes neither partition the receivers among the proximity of the source to the tree; it helps in balancing the different core-trees nor they build a unique tree consisting of traffic load and improve performance. The trees are maintained multiple cores [23, 27]. In the first one, a primary core-based independently unlike the work in [23]. The core distribution tree spans over all groups while each of the other core trees span method follows the hash-based scheme of [5]. Note that a over all members of an existing group only. However, a non- different kind of load sharing (not load balancing) approach primary core tree may span over multiple groups based on the exists for splitting the load over Equal Cost Multipath (ECMP). number of active groups and the number of selected cores. In Multicast traffic from different sources or from different sources the second scheme, a single core tree will never span over all and groups are load split across equal-cost paths provided such groups. In our approaches, a sender does not decide which core equal cost paths exist and are recognized as well [18]. to send packets to unlike in [27]; rather any source must send its The idea of core-based multicasting has been extended to first packet to the primary core and the primary core then allow migration of cores [4]. When the performance at an decides if the sender needs to send the rest of its packets to any alternate core is ‘reasonably’ better than that at the current core, other core. We have discussed their relative merits. migration takes place to the alternate core. It helps in The third and the fourth schemes have considered low latency controlling multicast latency and load sharing. load shared schemes for delay-sensitive applications. We have It may be noted that there exist several important considered maximum path (depth) of a core-based tree as a contributions [9, 19, 22], which have attempted to solve the parameter in addition to the pseudo diameter to achieve low problem of core selection with varied levels of complexity for latency load share in multicasting. To the best of our delay and delay-variation constrained multicasting in static knowledge, there does not exist any work in this direction that networks. These works aim mainly at reducing delay in considers the above two parameters together. The major communication to satisfy the delay specifications of difference between these two with the first two schemes is that 134 IJCA, Vol. 25, No. 3, Sept. 2018

in these two schemes a sender selects a core for multicasting its Note that the diameter of the network is 90. From router A’s packets unlike in the first two; different senders belonging to the table, its pseudo diameter is 90, which is equal to the network same group or not may select different cores. Therefore, from diameter; whereas for router C it is 70 as is seen from C’s DVR load sharing viewpoint the last two schemes may outperform the table. It means that if C is the source of a communication, the first two; however, these last two schemes have the extra maximum cost to reach any other router will be 70, which is less overheads in that every core tree must span over all groups. than the network diameter of 90. In this context, the following Finally, we have presented a simple yet efficient core migration observations [15] are worth mentioning. scheme which is based on both depth and pseudo diameter. The organization of the paper is as follows. In Section 2, we 2.1 Multicore Selection Scheme state briefly the existing concept of pseudo diameter [13–15]. We also state the multicore selection scheme [7] used in this We briefly present a systematic approach [7] to select the cores paper for a better understanding of the proposed load sharing to be used for multicore multicasting. Cores are selected schemes. In Section 3, we present the proposed multi-core statically using the routing information of all routers in the group-based load-sharing multicast schemes. In Section 4, we underlying unicast domain. This core selection is independent present the two low latency schemes along with brief of any multicast group. For multicore multicast, each router ri comparisons of all the four multicast schemes. In Section 5, we executes the following steps to select the required number of present the core migration scheme. Finally, Section 6 draws the cores, m. conclusion. Multicore selection process 2 Preliminaries

1. Router ri determines its Pd (ri) from its unicast Two widely used unicast routing protocols are distance vector routing table. routing (DVR) and link state routing (LSR). In the former one, 2. It broadcasts Pd (ri) in the network using pseudo routers do not have the knowledge of network topology, diameter based broadcasting [15]. whereas in the later routers have this knowledge. The concept 3. Router ri receives all Pd (rj) from every other router of pseudo diameter is independent of the underlying unicast rj , 1 ≤ j ≤ n, j ≠ i routing protocol. We denote the unicast routing table by UCTi 4. Router ri creates a list of m routers out of all n for some router ri; it can be either the DVR table or the LSR routers, sorted in ascending order based on their Pd table of the router depending on the unicast protocol used. values. Pseudo diameter of a router ri denoted as Pd(ri) is defined as // the first one in the m-router list is the primary follows [13-15]. core.

Pd (ri) = max {ci,j}, where ci,j = cost (ri, rj ), Remark 1. Every router creates identical sorted list of the m [1 ≤ j ≤ n, j ≠ i] and ci,j ϵ UCTi and cores. Remark 2. Router with minimum Pd is the primary core. n = number of routers in the network Remark 3. Message complexity of the core selection process is O(n2). Remark 4. In the core selection process, a router does not need to know the entire topology. In words, pseudo diameter of router ri denoted as Pd(ri), is the Observation 1. For fault-tolerant single-core multicast, there maximum value among the costs (as present in its routing table are (m-1) number of redundant (stand by) cores. UCTi) to reach from ri all other routers in a network. The Observation 2a. To guard against any possible core failures implication of pseudo-diameter is that any other router is while using the m cores for multicasting, a total of 2m cores can reachable from router ri within the distance (i.e., no. of be selected by the Multicore selection process. hops/delay etc.) equal to the pseudo diameter Pd (ri) of router ri,, Observation 2b. According to the positions in the sorted list It thus directly relates to the physical location of router ri. all odd-numbered cores can be used for multicasting. Every Pseudo diameter is not the actual diameter of the network, odd-numbered core will have its standby as the even numbered because it depends on the location of router ri in the network. core that immediately follows it in the sorted list of the cores; So different routers in the network may have different values for this standby core selection utilizes the proximity between these their respective pseudo diameters. Therefore, pseudo diameter two cores from the viewpoint of their pseudo diameter values. Pd is always less than or equal to the actual diameter of a network. 2.2 An Example As an example, consider the network shown in Figure. 1. Without any loss of generality, we have considered DVR Consider the example network of Figure.1. Primary core, in protocol as the underlying unicast protocol used in the network. our approach, is a router that has the least pseudo diameter value The respective DVR tables of the routers appear in Figure. 2. compared to all other routers in the network. From the DVR IJCA, Vol. 25, No. 3, Sept. 2018 135

tables (Figure 2) of the network, pseudo-diameter of all routers least pseudo-diameter. It will make the single-core scheme fault in the network can be obtained. Pseudo diameters of routers A, tolerant. More cores can act as standby for a larger degree of B, C, D, E, F, G, and H, are 90, 90, 70, 80, 60, 90, 80, and 80, fault-tolerance. respectively.

Each router broadcasts its Pd value to all other routers in the A network. At the end of the broadcast, each router has the same sorted list of the routers based on their respective pseudo diameters. It is shown in Figure 3. Observe in the Figure that 40 30 there is more than one router with the same Pd value. Routers D, G, and H have the same Pd value, viz., 80. If this situation B C arises, these routers are sorted in descending order of their router Id’s (IP addresses). Without any loss of generality, we assume 20 that router H has the highest router Id, followed by routers G 30 50 50 and D respectively. So, router H has the priority to be selected as a possible candidate core over routers G and D. According D E F to Figure 3, for the given network in Figure 1, router E is the primary core as it is the one with the least pseudo-diameter 20 20 20 20 value, 60. For multicore multicasting, the first few routers from the list will be selected as needed. To incorporate core G H redundancy in case of single core multicast, router C can be chosen as the secondary core (standby core) as it has the next Figure 1: An 8-router network

Figure 2: DVR tables of the routers

2.3 Performance

The multicore selection scheme needs only one broadcast independent of the number of cores to be used for multicasting. Therefore, the message complexity is O(n2), where n is the total number of nodes in the network. However, in this approach a router does not need to have the complete topological information. Note that to incorporate the effect of any changes in the network, e.g., router failure, this core selection process can run periodically. Besides, the core selection scheme is independent of any multicast groups unlike most existing works Figure 3: Sorted list based on the pseudo diameters because it is a static core selection approach. A detailed 136 IJCA, Vol. 25, No. 3, Sept. 2018

discussion of the performance of the presented core selection the complete topological information. Experimental setup has method has appeared in [7]. The core selection scheme has been used NS2 simulator, BRITE topology generator, and Waxman’s compared with some important existing core selection probability model for interconnection of the nodes. algorithms, mainly Maximum Path Count (MPC) [12], Delay Variant Multicast Algorithm (DVMA) [21], Minimum Average 3 Group-Based Load Shared Multicore Multicoast Distance (MAD) method, and OptTree method [12]. Results of the comparisons for randomly generated networks of different In this section, we present two simple and yet very effective sizes are shown in the following Figures. The superiority of our group-based load-shared multicast schemes, GLSM-cast1 and approach from the viewpoint of message complexity to these GLSM-cast2; we have considered distribution of groups to other approaches is evident from the results. Note that unlike in different cores. Each core tree spans all members of a single OptTree method, in our approach a router does not need to have existing group. It may also span over multiple groups based on the number of active groups and the number of selected cores. However, a single core tree will never span over all groups. 30 Node Network Cores are determined statically using the routing information of all routers in the underlying unicast domain [7]. It is MPC / MAD 100000000 independent of any underlying unicast protocol active in the domain. The best core, i.e. the core with the minimum pseudo 1000000 DVMA diameter, is called the primary core. In our approach, a sender 10000 does not decide which core to send packets to unlike in [27]; rather any source must send its first packet to the primary core 100 OptTree Message Complexity and the primary core then decides if the sender needs to send the 1 rest of its packets to any other core. 5 10 15 20 25 Static Core We use the following notations in the proposed schemes. We Group Members Selection assume that there are m cores present in the system. C0 denotes th new the primary core, and Ci denotes the (i+1) core; S denotes a

Figure 4: 30-router network new sender for some group. In addition, P denotes the number of distinct multicast groups at a given time and N is an integer variable initialized with 0. We assume that the primary core C0 maintains a dynamically growing linear array of integers, say 40 Node Network A[]. At any given time the number of elements in the array is the number of active multicast groups in the network; A[1] is 0 th 100000000 MPC / MAD and it corresponds to the first active group. The k entry of the array corresponds to the kth newly active group. Primary core 1000000 C0 sets A[k] to (k-1) when it receives the first multicast packet 10000 DVMA for the kth newly active group. We present now the two different 100 schemes. OptTree Let us now estimate the extra traffic load on the primary core.

Message Complexity 1 Without any loss of generality, let the average number of 5 10 15 20 25 senders per group be n. So, the total number of senders is nP. Static Core Group Members Selection Primary core always has to send the message ‘continue with core Cj’. So, there are nP unicasts to the sources from the Figure 5: 40-router network primary core. In addition, we need to consider the following different cases as well. Let P = km + n', k is an integer and 0 ≤ k and n' < m. Observe that for GLSM-Cast-1 to work correctly, only 50 Node Network primary core (C0) tree spans all members of all groups. 1E+10 In the next, we state the second scheme. MPC / MAD Let us now estimate the extra traffic load on the primary core. 100000000 As in GLSM-cast1, primary core always has to send the 1000000 DVMA message ‘continue with core Cj’. Hence, there are nP unicasts 10000 to the sources from the primary core. In addition, we need to consider the following as well. 100 OptTree Relative Merits: We now discuss briefly the effect on 1 bandwidth consumption in the network caused by the two Message Complexity 5 10 15 20 25 Static Core schemes. We also discuss the load on the primary core caused Group Members Selection by the two schemes. We have shown above that in GLSM- Cast1, primary core has to execute extra multicasts which is Figure 6: 50-router network absent in GLSM-Cast2. Note that total number of multicasts IJCA, Vol. 25, No. 3, Sept. 2018 137

GLSM-Cast1 (i.e., connected to a multicast source) and the core Ci which ri uses for multicasting its packets. For cost (especially delay)

sensitive applications, this last factor should be considered for Sender Snew (executed by each new sender) st selecting the core to achieve low latency in multicasting, in 1. sends the 1 multicast packet to C0 addition to the core’s pseudo diameter. In this section, we have 2. receives the message ‘continue with core Cj’ proposed two such schemes in which a sender itself selects a 3. sends rest of the packets to Cj core for multicasting instead of taking help of the primary core unlike the schemes presented in the previous section. In fact, Primary core C0 (executed by the primary core) different senders belonging to the same group will select 1. receives the 1st multicast packet from Snew possibly different cores; the objective is to achieve low latency 2. if corresponding ith group entry exists in A[] in multicasting with load sharing on the cores. Therefore, the N = A[i] // A[i] = i-1 following two approaches are more general in nature and have new Cj = N mod m // selects the core for S multicasts the packet in the C0-based tree GLSM-Cast2 new unicasts ‘continue with core Cj’ to S else Sender Snew (executed by each new sender) st A[i] = i-1 // A[] grows dynamically 1. sends the 1 multicast packet to C0 th // i newly active group 2. receives the message ‘continue with core Cj’ N = A[i] 3. sends rest of the packets to Cj new Cj = N mod m // selects the core for S multicasts the packet in the C0-based tree Primary core C0 (executed by the primary core) new st new unicasts ‘continue with core Cj’ to S 1. receives the 1 multicast packet from S 2. if corresponding ith group entry exists in A[] Core Cj N = A[i] // A[i] = i-1 new new 1. multicasts the packets received from S Cj = N mod m // selects the core for S

if Cj ≠ C0 Case 1: P ≤ m C0 unicasts the packet to Cj new unicasts ‘continue with core Cj’ to S number of extra single-packet multicasts by C0

is n(P-1) else C0 multicasts the packet new Case 2: P > m unicasts ‘continue with core Cj’ to S

number of extra single-packet multicasts by C0 is else k(m-1)n + (n'-1)n A[i] = i-1 // A[] grows dynamically // ith newly active group

N = A[i] remains the same in both methods. Therefore, overall new Cj = N mod m // selects the core for S bandwidth consumption in the network, strictly due to multicasting, remains the same in both schemes. However, in GLSM-Cast2 the number of extra unicasts done by C0 compared if Cj ≠ C0 to GLSM-Cast1 is [k(m-1)n + (n'-1)n]. Therefore, the overall C0 unicasts the packet to Cj new bandwidth consumption in the network is more in GLSM-Cast2 unicasts ‘continue with core Cj’ to S than in GLSM-Cast1. However, actual load on C0 in GLSM- else C0 multicasts the packet new Cast2 is less, as it is not involved in any extra single packet unicasts ‘continue with core Cj’ to S multicast. In addition, primary core tree does not span over all groups unlike in the first one. Core Cj

1. multicasts the packet received from Co 4 Low-Latency Load Shared Approach 2. multicasts the packets received from Snew

In GLSM-Cast1 and GLSM-Cast2 schemes, the underlying assumption is that the cost (delay/number of hops etc.) between Case 1: P ≤ m the root router (core) and a leaf router (a group member) is the number of unicasts by C0 to other cores is n(P-1) pseudo diameter of the core. In reality, this assumption may or Case 2: P > m may not be true; it depends on the physical locations of the number of unicasts by C0 to other cores is routers connected to the group members. The above two k(m-1)n + (n'-1)n schemes have not considered the cost between a source router ri 138 IJCA, Vol. 25, No. 3, Sept. 2018

the potential of outperforming the previous two schemes. considered all the four multicast approaches to discuss their Before we state the proposed approaches, we define the relative merits. following. LLM-Cast2 α new new Definition a. Cost (S , Cj) = Cost (S , Cj) + Pd (Cj) Sender Snew (executed by each new sender) β new new Definition b. Cost (S , Cj) = Cost (S , Cj) + dj (Cj) 1. Snew unicasts a request packet to Cj to get the list L' new such that Cost (S , Cj) is minimum for all Cj in L In the second definition, dj denotes the longest path of the core // it can be done via anycast as well new tree rooted at Cj; we name dj as the depth of the tree; note that dj 2. S determines the core Ck, such that α new α new ≤ Pd (Cj). We shall elaborate on it further when we present the Cost (S , Ck) = min {Cost (S , Cj)} for all Cj second low latency based approach in the next subsection. in L new We already have established the fact that every router has the 3. S determines Cr, such that β new β new same sorted list of m candidate cores. Let us denote this list as Cost (S , Cr) = min {Cost (S , Cj)} for all Cj L in L' α new β new 4. if Cost (S , Ck) ≤ Cost (S , Cr) new and L = < C0, C1, C2, … , Cm-1 >, such that Pd (Ci) ≥ Pd (Ci-1), S unicasts packets to Ck new 0 ≤ i ≤ m-1. else S unicasts packets to Cr

4.1 Low Latency Approach 1

We assume that Pd (Cj) = dj for each candidate core. The 4.3 Relative Merits of LLM-Cast1 and LLM-Cast2 algorithm is executed by each new source. In the LLM-Cast1 approach, one underlying assumption is LLM-Cast1 that there will exist receiver(s) in the core-based tree at a Sender Snew (executed by each new sender) distance from the core equal to the pseudo diameter of the core. new α new 1. S computes Cost (S , Cj) for all Cj in L This assumption may or may not be true; it solely depends on new α new 2. S identifies the core Ci such that Cost (S , Ci) the geographical locations of the receivers. However, it may is minimum lead to not so accurate computation of the costs. Hence, new 3. S unicasts packets to Ci for multicast multicast communication speed may be negatively affected. The LLM-Cast2 approach attempts to fine-tune this aspect of

the LLM-Cast1 approach from the viewpoint of achieving more 4.2 Low Latency Approach 2 accurate computation for the costs resulting in speedier

multicast communication than that in the LLM-Cast1 approach. In the second approach, we consider the general case, that is, For this, it needs to determine the depth of each core tree and the depth of a core tree rooted at a core Ci is less than or equal therefore, it needs to consider some more unicasts compared to to the pseudo diameter of the core; that is, Pd (Ci) ≥ di. LLM-Cast1. The number of such unicasts, say Y is computed We assume that each leaf router rj unicasts a hello packet to as follows. its core Ci. Core Ci then determines Cost (rj, Ci) from its unicast Y = m(m-1) + 2X, where m is the number of cores and X is routing table. It computes max {Cost (rj, Ci)} considering all its the number of senders (multicast sources). leaf routers. This is its depth di. The proposed second approach Assuming that m is given, it is seen that Y varies linearly with (LLM-Cast2) needs each core to create a list L' and it is done in X. It also means that bandwidth consumption also varies the following way. linearly with X. Observe that this is the extra amount of

bandwidth consumption compared to that in LLM-Cast1. Each core Ci executes the following algorithm.

1. Ci unicasts its di to all Cj in L, j ≠ i Remark 6. LLM-Cast2 offers more accurate computation of 2. Ci receives each dj cost than LLM-Cast1 resulting in speedier multicast 3. Ci creates the list L' = < dr, dr+1, dr+2…, ds >, communication than the LLM-Cost1 approach. where dr is the depth of core tree rooted at Cr Remark 7. LLM-Cast1 is slower but needs more bandwidth and dr ≤ dr+1 ≤ dr+2 ≤ … ≤ ds ; |L'| = m efficient than the LLM-Cast2 approach. Remark 5. Every core Ci creates identical sorted list of the depths (L'). 4.4 Relative Merits of the Four Approaches

We observe that LLM-Cast2 is likely to offer better low We consider the following aspects for comparison: load latency compared to LLM-Cast1; however, in LLM-Cast2, some shared per core, bandwidth consumption, and multicast extra information/computation is needed to form the list L'. communication speed. Relative merits of the above two approaches are stated in more detail in the following subsection. In subsection 4.4, we have Load shared per core: In the LLM-Cast1 and LLM-Cast2 IJCA, Vol. 25, No. 3, Sept. 2018 139

approaches, a source (sender) decides which core it will use for otherwise increases delay further in multicasting, a threshold δ multicasting independent of any other sender. It adds an can be set up so that when new_cost – current_cost ≥ δ , source element of randomness in selecting a core for multicasting S* migrates to an alternate core, say Cj, which offers the which is absent in the GLSM-Cast1 and GLSM-Cast2 minimum cost compared to any other candidate core. For the approaches; therefore, both LLM-Cast1 and LLM-Cast2 proposed core migration algorithm to work correctly, it is approaches offer better load balancing than both GLSM-Cast1 required that every core Ci monitor its depth di; the regularity of and GLSM-Cast2 approaches. this probing and this threshold value are dictated by both Bandwidth consumption: We discuss this aspect from two application behavior and the overhead cost associated with viewpoints: number of trees rooted at the same core and number moving to a different core. We now state the algorithm. of unicasts needed in each approach. Algorithm Core-Migration (a) We start with the first one. In the GLSM-Cast1 approach, Executed by each core Ci the primary core spans over all existing groups and all other 1. Ci unicasts its new depth di to each Cj in L cores span over only some subsets of the groups. In the // new_cost – current_cost ≥ δ GLSM-Cast2 approach, all cores including the primary one 2. Ci creates the new list L' * span over only some subsets of the groups. In the LLM- 3. Ci unicasts the list L' to S * Cast1 and LLM-Cast2 approaches, each core spans over all // Ci is the current core for S existing groups. It causes more bandwidth consumption Executed by source S* than in the two GLSM approaches. Besides, it also causes 1. S* receives the list L' a larger number of memory states to be created and 2. S* determines the core Cl such that maintained by the routers present in the different trees new_cost (Cl) = min {current_cost (Cr)} for any compared to the two GLSM approaches. Cr in L, r ≠ l (b) We now consider the effect on bandwidth consumption due // identical computations as in steps 2 to 4 in to unicasts. Let N1, N2, N3, and N4 denote the respective LLM-Cast2 total number of unicasts in the four approaches, namely 3. S* unicasts packets to Cl for multicasting GLSM-Cast1, GLSM-Cast2, LLM-Cast1, and LLM-Cast2. // core migration takes place Then from the four algorithms we observe that N1 = 2N3 and N2 = 3N3. Also, N4 is approximately the It may be noted that joining/leaving of group members may same as N1 when the number of cores is small and is less cause changes in the depths of more than one core than N2 as long as the number of senders is much larger than simultaneously. It may take some time for the convergence, i.e., the number of cores, which is usually the case. This simple when every core will have identical L'. Therefore, we consider analysis suggests that the last two have better bandwidth the following in the above algorithm to tackle the problem of utilization; however, the tradeoff is the extra computation convergence: After building a new L' , a core Ci waits for a done by every sender to determine the best cost. small time interval (∆t) to incorporate any further possible updating of L' caused by some other cores. After ∆t, the core Ci Multicast communication speed: Faster communication is unicasts the final list L' to S* (step 3 of the pseudocode Executed achieved in the LLM-Cast1 and LLM-Cast2 approaches than the by each core Ci). two GLSM approaches, because they consider reduction of delay in addition to load sharing. 6 Conclusion

5 Core Migration In this paper, we have used an existing multicore selection scheme [7] for designing our proposed four novel load shared We extend the low-latency multicast schemes to allow approaches. This multicore selection scheme has been the migration of the cores. Such migration is needed when the choice because of its ease of implementation [7]. The proposed performance at an alternate core is ‘substantially’ better than load sharing schemes neither partition the receivers among the that at the current core. Below, we have stated the working different core-trees nor they build a unique tree consisting of principle of our proposed core migration approach before its multiple cores [23-25, 27]. In the first two schemes, a sender formal presentation. belonging to any group does not decide which core to send The depth di of a core tree with core Ci can change due to new packets to unlike in [27]; rather any source must send its first member(s) joining or existing member(s) leaving. Therefore, to packet to the primary core and the primary core then decides if maintain low latency in multicasting, an existing source, say S* the sender needs to send the rest of its packets to some other may need to migrate to a different core if it finds its ‘new cost’ core. In the other two, we have considered low latency load has increased substantially from its ‘current cost’, both shared schemes, which are suitable for delay-sensitive measured with respect to the current core, say Ci. It is applications. We have considered maximum path (depth) of a α * understood that these costs can be either Cost (S , Ci) or Cost core-based tree as a parameter in addition to the pseudo β * (S , Ci) as determined by S* (following LLM-Cast2 in diameter to achieve low latency. To the best of our knowledge, subsection 4.2). To prevent frequent migration, which there does not exist any work in this direction that considers the 140 IJCA, Vol. 25, No. 3, Sept. 2018

above two parameters together. The two low latency load Emergent and Distributed Systems, 20(4):69-84, March shared schemes can outperform the first two ones from the 2005. viewpoint of better load sharing since in the last two, any source [12] Ayse Karaman and H. Hassanein, “Core Selection can send to any core based on cost; thereby effectively adding Algorithms in Multicast Routing – Comparative and an element of randomness in selecting a core for multicasting; Complexity Analysis”, J. Computer Communications, however, it incurs extra overhead in that every core tree must 29(8):998-1014, May 2006. span over all groups. Finally, we have extended the low latency [13] S. Koneru, B. Gupta, S. Rahimi, and Z. Liu, “Hierarchical approach to design a simple yet efficient core migration scheme, Pruning to Improve Bandwidth Utilization of RPF-Based which considers both depth of a core tree and pseudo diameter Broadcasting”, IEEE Symposium on Computers and of the core. Communications (ISCC), Split, Croatia, pp. 96-100, July From the viewpoint of fault tolerance, the degree of fault- 2013. tolerance can be enhanced from covering single point-failure to [14] S. Koneru, B. Gupta, and N. Debnath, “A Novel DVR any number of core failures as is evident from the Observations Based Multicast Routing Protocol with Hierarchical 1, 2a, and 2b. Pruning”, International Journal of Computers and Their Applications (IJCA), 20(3):184-191, September 2013. References [15] S. Koneru, B. Gupta, S. Rahimi, Z. Liu, and N. Debnath, “A Highly Efficient RPF-Based Broadcast Protocol Using [1] J. Nicholas Adams, and W. Siadak, “Protocol Independent a New Two-Level Pruning Mechanism”, Journal of Multicast - Dense Mode (PIM-DM)”, Internet Computational Science (JOCS), 5(3):645-652, March Engineering Task Force (IETF), RFC-3973, January 2005. 2014, SpringerLink, Berlin, Heidelberg, 2345:1045-1056, [2] Tony A. Ballardie, “Core Based Tree Multicast Routing 2002. Architecture”, Internet Engineering Task Force (IETF), [16] H. Lee and C. Youn, “Scalable Multicast Routing RFC 2201, September 1997. Algorithm for Delay Variation Constrained Minimum [3] S. Deering and D. Cheriton, “Multicast Routing in Cost Tree”, Proc. IEEE ICC, New Orleans, LA, USA, Datagram Internetworks and Extended LANs”, ACM 3:1343-1347, June 2000. Transactions on Computer Systems (TOCS), 8(2):85-110, [17] H. Lin and S. Lai, “A Simple and Effective Core May 1990. Placement Method for the Core Based Tree Multicast [4] M. Donahoo and E. Zegura, “Core Migration for Dynamic Routing Architecture”, Proc. IEEE Int. Conf. Multicast Routing”, Int’l Conf. Computer Comm. and Performance, Computing, and Communications, Phoenix, Networks’, June 1996. AZ, USA, pp. 215-219, February 2000. [5] D. Estrin, M. Handley, A. Helmy, and P. Huang, “A [18] “Load Splitting IP Multicast Traffic over ECMP.” Cisco, Dynamic Bootstrap Mechanism for Rendezvous-Based www.cisco.com/c/en/us/td/docs/ios/12_4t/, ip_mcast/con Multicast Routing”, Proc. IEEE INFOCOM, pp. 1090- figuration/guide/mctlsplt.html, Mar 2015. 1098, March 1999. [19] Adel Ben Mnaouer, M. Aissa, and A. Belghith, “An [6] M. Garey and D. Johnson, Computers and Ins-tractability, Efficient Core Selection Algorithm for Delay and Delay- W. H. Freeman and Co., 1999. Variation Constrained Multicast Tree Design”, Proc. [7] B. Gupta, A. Alyanbaawi, S. Rahimi, N. Rahimi, and K. Second international conference on Global Information Sinha, “An Efficient Approach for Load- Shared and Infrastructure Symposium (IEEE GIIS'09), pp. 281-285, Fault-Tolerant Multicore Shared Tree Multicasting”, June 2009. Proc. IEEE INDIN, Emden, Germany, pp. 937-943, July [20] T. Pusateri, “Distance Vector Multicast Routing 2017. Protocol”, Juniper Networks, Internet Engineering Task [8] M. Handley Fenner, H. Holbrook, and I. Kouvelas, Force (IETF), draft-ietf-idmr-dvmrp-v3-11.txt, October “Protocol Independent Multicast - Sparse Mode (PIM- 2003. SM)”, Internet Engineering Task Force (IETF), RFC- [21] G. Rouskas and I. Baldine, “Multicast Routing with End- 4601, August 2006. to-End Delay and Delay Variation Constraints”, IEEE [9] H. Harutyunyan and M. Terzian, “A Multi-Core Multicast Journal on Selected Areas in Communications, 15(3):346- Approach for Delay and Delay Variation Multicast 356, April 1997. Routing”, Proc. IEEE 19th Int’l Conf. High Performance [22] P. Sheu and S. Chen,” A Fast and Efficient Heuristic Computing and Communications, Bangkok, Thailand, pp. Algorithm for the Delay and Delay Variation-Bounded 154-161, December 2017. Multicast Tree Problem”, Computer Communications, [10] W. Jia, W. Zhao, D. Xuan, and G. Xu, “An Efficient Fault- 25(8):825-833, May 2002. Tolerant Multicast Routing Protocol with Core-Based [23] C. Shields and J.Garcia-Luna-Acevez, “The Ordered Core Tree Techniques”, IEEE Trans. on Parallel and Based Tree Protocol”, Proc. IEEE INFOCOM’97, Kobe, Distributed Systems, 10(10):984-1000, October 1999. Japan, pp. 884-891, April 1997. [11] W. Jia, W. Tu, W. Zhao and G. Xu, “Multi-Shared-Trees [24] Y. Shim and S. Kang, “New Center Location Algorithms Based Multicast Routing Control Protocol Using Anycast for Shared Multicast Trees”, Lecture Notes in Computer Selection”, The International Journal of Parallel, Science, SpringerLink, Berlin, Heidelberg, 2345:1045- IJCA, Vol. 25, No. 3, Sept. 2018 141

1056, 2002. Koushik Sinha is currently an Assistant [25] D. G. Thaler and C. Ravishankar. “Distributed Center- Professor in the Department of Computer Location Algorithms”, IEEE Journal on Selected Areas in Science at Southern Illinois University, Communication, 15(3):291-303, April 1997. Carbondale. He is the co-author of the [26] C. Partridge Waitzman and S. Deering, “Distance Vector book Wireless Networks and Mobile Multicast Routing Protocol (DVMRP)”, Internet Computing published by CRC Press, Engineering Task Force (IETF), RFC 1075, November Taylor and Francis Group, USA in 2015. 1988. Prior to joining SIU, he was with the [27] D. Zappala, A. Fabbri, and V. Lo, “An Evaluation of Social Computing Group of Qatar Shared Multicast Trees with Multiple Active Cores”, Computing Research Institute, Qatar from 2013 to 2015. Journal of Telecommunication Systems, 19:461-479, Previously, he was a Research Scientist at Hewlett-Packard March 2002. Labs. He is a Senior Member of the IEEE. His current research focus is in the areas of 5G, wireless sensor networks, IoT and Fog Computing.

Bidyut Gupta is a Professor at the Department of Computer Science, Southern Illinois University at Ziping Liu is a Professor in the Carbondale. He is a senior member of the Department of Computer Science at IEEE. His current research interest Southeast Missouri State University. Her includes fault-tolerant mobile computing, research interests include wireless ad hoc design of communication protocols, network/sensor network, distributed design of P2P federation architecture, and computing, cloud computing, wireless network security. network security, game development, game AI, mobile computing and web development.

Ashraf Alyanbaawi was born in Madinah, The Kingdom of Saudi Arabia. Ashraf is a PhD Candidate in Southern Illinois University, Carbondale, IL, USA. After finishing his undergraduate degree at the University Tenaga Nasional, Malaysia, he started his career as a Teaching Assistant at Taibah University, Yanbu, Saudi Arabia. In 2016, he received his MS degree in Computer Science from Southern Illinois University. His research focus is in the area of computer networks.

Nick Rahimi is an Assistant Professor in the Computer Science Department at Southeast Missouri State University. His main research interests revolve around Computer and Network Security, Distributed Systems, Peer-to-Peer Networks and their Privacy, and Data Communication. He has earned two Bachelor of Science degrees in Software Engineering and Information Systems Technologies. Nick obtained his Master and Ph.D. degrees in Computer Science from Southern Illinois University.

Instructions For Authors

The International Journal of Computers and Their Applications is published multiple times a year with the purpose of providing a forum for state-of-the-art developments and research in the theory and design of computers, as well as current innovative activities in the applications of computers. In contrast to other journals, this journal focuses on emerging computer technologies with emphasis on the applicability to real world problems. Current areas of particular interest include, but are not limited to: architecture, networks, intelligent systems, parallel and distributed computing, software and information engineering, and computer applications (e.g., engineering, medicine, business, education, etc.). All papers are subject to peer review before selection.

A. Procedure for Submission of a Technical Paper for Consideration:

1. Email your manuscript to the Editor-in-Chief, Dr. Fred Harris, Jr. [email protected]. 2. Illustrations should be high quality (originals unnecessary). 3. Enclose a separate page for (or include in the email message) the preferred author and address for correspondence. Also, please include email, telephone, and fax information should further contact be needed.

B. Manuscript Style:

1. The text should be, double-spaced (12 point or larger), single column and single-sided on 8.5 X 11 inch pages. 2. An informative abstract of 100-250 words should be provided. 3. At least 5 keywords following the abstract describing the paper topics. 4. References (alphabetized by first author) should appear at the end of the paper, as follows: author(s), first initials followed by last name, title in quotation marks, periodical, volume, inclusive page numbers, month and year. 5. Figures should be captioned and referenced.

C. Submission of Accepted Manuscripts:

1. The final complete paper (with abstract, figures, tables, and keywords) satisfying Section B above in MS Word format should be submitted to the Editor-in-chief. 2. The submission may be on a CD/DVD, or as an email attachment(s). The following electronic files should be included: • Paper text (required) • Bios (required for each author). Integrate at the end of the paper. • Author Photos (jpeg files are required by the printer) • Figures, Tables, Illustrations. These may be integrated into the paper text file or provided separately (jpeg, MS Word, PowerPoint, eps). title of the paper. 3. Specify on the CD/DVD label or in the email the word processor and version used, along with the title of the paper. 4. Authors are asked to sign an ISCA copyright form (http://www.isca-hq.org/j-copyright.htm), indicating that they are transferring the copyright to ISCA or declaring the work to be government-sponsored work in the public domain. Also, letters of permission for inclusion of non-original materials are required.

Publication Charges:

After a manuscript has been accepted for publication, the author will be invoiced for publication charges of $50 USD per page (in the final IJCA two-column format) to cover part of the cost of publication. For ISCA members, $100 of publication charges will be waived if requested.

January 2014 ISCA INTERNATIONAL JOURNAL OF COMPUTERS AND THEIR APPLICATIONS Vol. 25, No. 3, Sept. 2018