Data Synchronization for IITBombayX Platform Related Work and Background Architecture And Design Implementation Conclusion and Future Work

Data Synchronization for IITBombayX Platform

Alpesh Rathore

113050057

Under the guidance of Prof. D.B. Phatak

Computer Science Department, IIT Bombay

1 / 65 Data Synchronization for IITBombayX Platform Related Work and Background File Synchronization Techniques Architecture And Design Tools for File Synchronization Implementation IITBombayX Platform Conclusion and Future Work

1 Data Synchronization for IITBombayX Platform File Synchronization Techniques Tools for File Synchronization IITBombayX Platform

2 Related Work and Background Related Work Background

3 Architecture And Design File Synchronization Admin Utility File Synchronization Admin Web Application Android Application Architecture Use Case

4 Implementation File Synchronization Web Application Android Application Implementation Technologies Used

5 Conclusion and Future Work 2 / 65 Data Synchronization for IITBombayX Platform Related Work and Background File Synchronization Techniques Architecture And Design Tools for File Synchronization Implementation IITBombayX Platform Conclusion and Future Work Outline

1 Data Synchronization for IITBombayX Platform File Synchronization Techniques Tools for File Synchronization IITBombayX Platform 2 Related Work and Background Related Work Background 3 Architecture And Design File Synchronization Admin Utility File Synchronization Admin Web Application Android Application Architecture Use Case 4 Implementation File Synchronization Web Application Android Application Implementation Technologies Used 5 Conclusion and Future Work

2 / 65 Data Synchronization for IITBombayX Platform Related Work and Background File Synchronization Techniques Architecture And Design Tools for File Synchronization Implementation IITBombayX Platform Conclusion and Future Work Problem Statement

Design a web-application based solution for full-fledged File Synchronization over a distributed set of servers in various colleges so that Faculty or students or other members of IITBombayX course can contribute to the course content which can be verified and synched to central server from where it becomes available to members of the course across teh globe, along with additional facilities, like, viewing history, periodic synchroinzation, notification, etc.

3 / 65 Data Synchronization for IITBombayX Platform Related Work and Background File Synchronization Techniques Architecture And Design Tools for File Synchronization Implementation IITBombayX Platform Conclusion and Future Work File Synchronization

Keeping two different locations on same or different storing devices synchronized with each other. Mirroring: Keeping one location synced with another so that if some changes are made on one location, they are reflected on the other end, but NOT vice versa. Two Way: Both of synced locations are exactly identical every time.

Refs: [1, 2, 3]

4 / 65 Data Synchronization for IITBombayX Platform Related Work and Background File Synchronization Techniques Architecture And Design Tools for File Synchronization Implementation IITBombayX Platform Conclusion and Future Work File Synchronization Techniques

Delta Differencing: Transfer the change, not whole content as in FTP. Security and encryption Compression Multiple locations syncing Progress, failures, successes, performance, bandwidth utilization, etc.

5 / 65 Data Synchronization for IITBombayX Platform Related Work and Background File Synchronization Techniques Architecture And Design Tools for File Synchronization Implementation IITBombayX Platform Conclusion and Future Work Tools for File Synchronization

There are some file synchronization tools which are available. Unison: Unison is open source (under GNU license) and works over both windows and Linux platforms. It allows synchronization from both directions, i.e., two-way synchronizations and key features like star schema for synchronizing multiple machines. : RSync is more commonly used file synchronization tool which is also available with IBM AIX distribution. RSync was initially launched for Linux platform but now has been ported for Windows as well. It is a very flexibly configurable utility with an array of options to configure it to work according to one’s requirements. RSync has also got some tools which are built on top of it to provide GUI for the same. But most of them are not stable.

Refs: [1, 2, 3]

6 / 65 Data Synchronization for IITBombayX Platform Related Work and Background File Synchronization Techniques Architecture And Design Tools for File Synchronization Implementation IITBombayX Platform Conclusion and Future Work IITBombayX Platform

1 Open source platform for bringing students and faculties on a common platform. 2 Faculties offer courses from different universities on the platform. 3 Students can then register for various courses offered. 4 Students get through the course content, appear in tests, quizzes, etc. and get assessed.

Refs: [4]

7 / 65 Data Synchronization for IITBombayX Platform Related Work and Background File Synchronization Techniques Architecture And Design Tools for File Synchronization Implementation IITBombayX Platform Conclusion and Future Work IITBombayX Platform

1 CMS: Content Management System or IITBombayX Studio. CMS helps in managing the content of a course. It provides facilities to add, delete or edit courseware. 2 LMS: Learning Management System. Students interact to IITBombayX through LMS, which provides facilities like quiz, comments, feedback, submission, and other interactions. It is also responsible for displaying contents on the web page. 3 Configuration: provides for configuration management when setting up new IITBombayX platform. It uses Ansible for configuration management which is an open source platform for configuring and managing machines. 4 XBlock: provides for creating courseware for a course. Courseware follows a hierarchical structure. A courseware may be considered as composed of various components, where each component may be as simple as a paragraph, input form, video ,etc, and as complex as a section, chapter or complete course. 5 edx-ora2: takes care of any assessment related activities.

Refs: [4]

8 / 65 Data Synchronization for IITBombayX Platform Related Work and Background File Synchronization Techniques Architecture And Design Tools for File Synchronization Implementation IITBombayX Platform Conclusion and Future Work IITBombayX Platform

1 CS Comments Service: CS Comments Service module facilitates nested comments and voting. 2 XQueue: provides checking interface for LMS, so that when students makes any submission through LMS, the submission goes to XQueue which makes the submission assessed and graded by external service and sends the assessment back to LMS. 3 XServer: Responsible for taking code submissions taken by LMS and running the code using courseware checkers. 4 notifier: Sends daily feeds from forums to students registered on the forums. 5 Analytics Dashboard: Displays meta data about activities on their courses, like, enrollments, performance of students, etc. 6 Analytics Pipeline: Analyzes data from tracking logs and IITBombayX databases and provides analyzed information to outside world through edx-analytics-data-api.

Refs: [4]

9 / 65 Data Synchronization for IITBombayX Platform Related Work and Background Related Work Architecture And Design Background Implementation Conclusion and Future Work Outline

1 Data Synchronization for IITBombayX Platform File Synchronization Techniques Tools for File Synchronization IITBombayX Platform 2 Related Work and Background Related Work Background 3 Architecture And Design File Synchronization Admin Utility File Synchronization Admin Web Application Android Application Architecture Use Case 4 Implementation File Synchronization Web Application Android Application Implementation Technologies Used 5 Conclusion and Future Work

10 / 65 Data Synchronization for IITBombayX Platform Related Work and Background Related Work Architecture And Design Background Implementation Conclusion and Future Work Related Work

Lucky Backup [5]: Lucky Backup is a free desktop application which runs on top of RSync and gives following features[6]: 1 Backup It helps in keeping backup of the data on some remote machine, so that whenever any files are added, deleted, or modified, all the files and changes get backed up on the remote machine. 2 (Snapshots) User can take snapshot of the directory being backed up or synced and store the snapshot. Such snapshots can later be recovered and directory comes back to the same state as at the time of taking snapshot. 3 Sync User can sync multiple pairs of directories so that whenever there is a change in the synced file, those changes get reflected back on other locations. 4 Exclude Option To exclude one or more files based upon names or pattern of names. 5 Simulation This is a very powerful functionality, where if use is not sure of the outcome of running the RSync command, they can run a simulation of the command they are going to execute. Utility will produce similar results as real RSync command but there won’t be any changes on either end of synchronization. Once user is sure, they can go forward with executing the command. 11 / 65 Data Synchronization for IITBombayX Platform Related Work and Background Related Work Architecture And Design Background Implementation Conclusion and Future Work Related Work

FlyBack [7]: FlyBack was initially built on top of RSync but is now created from scratch. It is more useful in cases where incremental changes of files need to be maintained. It provides facility to backup incremental changes which can later be retrieved back. Grsync [8]: Grsync is an open source utility under GPL license. It is built on top of RSync tool and provides for syncing directories locally or over the network. Grsync is also a desktop application and does not have any web interface to work with. Mostly all of the features provided by ’Lucky Backup’ tool are supported by Grsync as well. It has support for MAC OS as well as windows version is also available for Windows OS. Gadmin: Gadmin is another tool with almost same facilities as Grsync but does not support other Operating Systems than Linux. Refs: [5, 7, 8]

12 / 65 Data Synchronization for IITBombayX Platform Related Work and Background Related Work Architecture And Design Background Implementation Conclusion and Future Work Problems with already available technologies

1 Most of available are desktop applications. 2 We will be developing a web based application which can easily be configured and provides for File Synchronization. 3 We can open web service end points which can be accessed to view various informations about the pair of servers 4 With desktop applications, separate web based application needs to be published which provides relevant information through web services.

13 / 65 Data Synchronization for IITBombayX Platform Related Work and Background Related Work Architecture And Design Background Implementation Conclusion and Future Work RSync Algorithm

Setup: 1 File X on node A has to be made consistent with file Y on node B 2 Node A and B are connected over (slow) network. Algorithm: 1 RSync uses Delta Differencing 2 It divides file into blocks and calculates checksum of blocks. 3 Finds out what blocks need to be transfered because of changes by sending these blocks over network to other machine

Refs: [1, 2, 3]

14 / 65 Data Synchronization for IITBombayX Platform Related Work and Background Related Work Architecture And Design Background Implementation Conclusion and Future Work RSync Algorithm

1 B splits the file Y into a series of non-overlapping fixed-sized chunks of size s. Last chunk need not be size s. 2 B finds two checksums of each chunk, a weak checksum, i.e., rolling 32 bit checksum and a strong 128 bit MD5 checksum. 3 B sends checksums to A. 4 A finds checksums for every possible chunk if file X of size s. 5 Compares weak checksum for every chunk with list of checksums sent over by B. Once a chunk is qualified for having same checksum with another chunk sent by B then it has to qualify for strong checksum. If both weak and strong checksum match, then the chunk is considered to be the same. 6 Although, matching the checksums does not guarantee the blocks to be same, but there is negligible chances that two different chunks have same weak as well as strong checksums. 7 Based upon above comparisons, A sends the offsets of chunks which matched and at what place those chunks are to be fit in Y. 8 A also sends ”literal” data, which is the data which did not match any block sent by B. 9 B, upon receiving the chunks with their corresponding offsets and the literal data with their corresponding offsets reconstructs the file Y. 10 Y on B is now same as X on A. 15 / 65 Data Synchronization for IITBombayX Platform Related Work and Background Related Work Architecture And Design Background Implementation Conclusion and Future Work RSync Algorithm

Rolling Checksum: Major time consuming step in above step is comparing the checksums sent over by B for every possible combination of chunks of size s on A. But, if the rolling checksum used in the algorithm has following property then this comparison step boils down to one parse of the file X: Calculating the checksum for a buffer X(2)...X(n+1) is a cheap operation given checksum for buffer X(1)...X(n), X(1) and X(n+1).

Figure: Diagram showing RSync Algorithm Working

Refs: [1, 2, 3]

16 / 65 Data Synchronization for IITBombayX Platform File Synchronization Admin Utility Related Work and Background File Synchronization Admin Web Application Architecture And Design Android Application Architecture Implementation Use Case Conclusion and Future Work Outline

1 Data Synchronization for IITBombayX Platform File Synchronization Techniques Tools for File Synchronization IITBombayX Platform 2 Related Work and Background Related Work Background 3 Architecture And Design File Synchronization Admin Utility File Synchronization Admin Web Application Android Application Architecture Use Case 4 Implementation File Synchronization Web Application Android Application Implementation Technologies Used 5 Conclusion and Future Work

17 / 65 Data Synchronization for IITBombayX Platform File Synchronization Admin Utility Related Work and Background File Synchronization Admin Web Application Architecture And Design Android Application Architecture Implementation Use Case Conclusion and Future Work Architecture And Design

Figure: File Synchronization Between Colleges and Servers

18 / 65 Data Synchronization for IITBombayX Platform File Synchronization Admin Utility Related Work and Background File Synchronization Admin Web Application Architecture And Design Android Application Architecture Implementation Use Case Conclusion and Future Work Overall Architecture

1 File Synchronization Utility is a Web Application 2 Pairs of directories need to be synched. 3 Directories are seggregated into various servers. 4 One server always has to be local server while other servers are remote servers (rsync requirement). 5 Various servers are seggregated into colleges for better management of servers and directory pairs.

19 / 65 Data Synchronization for IITBombayX Platform File Synchronization Admin Utility Related Work and Background File Synchronization Admin Web Application Architecture And Design Android Application Architecture Implementation Use Case Conclusion and Future Work File Synchronization Utility

Figure: Use case Diagram for File Synchronization Module

20 / 65 Data Synchronization for IITBombayX Platform File Synchronization Admin Utility Related Work and Background File Synchronization Admin Web Application Architecture And Design Android Application Architecture Implementation Use Case Conclusion and Future Work File Synchronization Utility

1 Login: Admin should be able to login into web interface using username and password. 2 Manage Servers: Admin should be able to add or remove various servers with which hosting server wants to sync. 3 Manage Directories: Admin should be able to manage directories pairs which need to be synced. 4 Manage Periodic Synchronization: Admin should be able to set periodic refreshes that should be done at regular interval, like, two directories need to be synced every 1 hour, n hours or m days, etc. He should also be able to sync instantly. 5 View Status: Admin should be able to view status of various servers and directories. Example, which directories are synced and which need to be synced. He should also get to know how much time remains for each directory to get synced automatically. Admin should also view current progress of the syncing process if any of them is in progress.

21 / 65 Data Synchronization for IITBombayX Platform File Synchronization Admin Utility Related Work and Background File Synchronization Admin Web Application Architecture And Design Android Application Architecture Implementation Use Case Conclusion and Future Work Database for File Synchronization Utility

Figure: ER Diagram for File Synchronization Utility

22 / 65 Data Synchronization for IITBombayX Platform File Synchronization Admin Utility Related Work and Background File Synchronization Admin Web Application Architecture And Design Android Application Architecture Implementation Use Case Conclusion and Future Work Database for File Synchronization Utility

1 userinfo ( loginId varchar(11) NOT NULL, loginPassword varchar(30) NOT NULL DEFAULT ’password’, fullName varchar(40) DEFAULT ’User Name’, userRole varchar(20) NOT NULL DEFAULT ’user’, PRIMARY KEY (loginId) ) 2 collegedata ( collegeid int(11) NOT NULL AUTO INCREMENT, collegename varchar(50) NOT NULL DEFAULT ’College Name’, collegeaddress varchar(60) NOT NULL DEFAULT ’College Address’, collegelat double DEFAULT NULL, collegelon double DEFAULT NULL, PRIMARY KEY (collegeid) )

23 / 65 Data Synchronization for IITBombayX Platform File Synchronization Admin Utility Related Work and Background File Synchronization Admin Web Application Architecture And Design Android Application Architecture Implementation Use Case Conclusion and Future Work Database for File Synchronization Utility

1 serverdata ( serverid int(11) NOT NULL AUTO INCREMENT, servername varchar(30) NOT NULL DEFAULT ’Server Name’, servercollege int(11) NOT NULL, serverlat double NOT NULL, serverlon double NOT NULL, serverip varchar(30) NOT NULL DEFAULT ’localhost’, serverappport int(11) NOT NULL DEFAULT ’8181’, PRIMARY KEY (serverid), UNIQUE KEY serverip (serverip) ) 2 directoriesinfo ( directoryid int(11) NOT NULL AUTO INCREMENT, directoryPath varchar(70) NOT NULL, directoryServer int(11) NOT NULL, PRIMARY KEY (directoryid) ) 3 directoriesrelation ( dirrelid int(11) NOT NULL AUTO INCREMENT, sourcedirectoryid int(11) NOT NULL, destinationdirectoryid int(11) NOT NULL, scheduletype varchar(30) NOT NULL DEFAULT ’None’, scheduleinterval int(11) NOT NULL DEFAULT ’1’, isactive tinyint(1) NOT NULL DEFAULT ’1’, PRIMARY KEY (dirrelid) ) 24 / 65 Data Synchronization for IITBombayX Platform File Synchronization Admin Utility Related Work and Background File Synchronization Admin Web Application Architecture And Design Android Application Architecture Implementation Use Case Conclusion and Future Work Database for File Synchronization Utility

1 synchistory ( syncid int(11) NOT NULL AUTO INCREMENT, directoriesrelationid int(11) NOT NULL, syncstarttime datetime NOT NULL DEFAULT CURRENT TIMESTAMP, syncendtime datetime DEFAULT NULL, synclogs longtext, lastUpdated datetime DEFAULT NULL, iskilled tinyint(1) NOT NULL DEFAULT ’0’, processstatus varchar(30) DEFAULT NULL, PRIMARY KEY (syncid) ) 2 notificationinfo ( notificationid int(11) NOT NULL AUTO INCREMENT, synchistoryid int(11) NOT NULL, notificationtext varchar(300) DEFAULT NULL, notificationcreated datetime NOT NULL DEFAULT CURRENT TIMESTAMP, isread tinyint(1) NOT NULL DEFAULT ’0’, notificationcode int(11) NOT NULL DEFAULT ’1’, PRIMARY KEY (notificationid) )

25 / 65 Data Synchronization for IITBombayX Platform File Synchronization Admin Utility Related Work and Background File Synchronization Admin Web Application Architecture And Design Android Application Architecture Implementation Use Case Conclusion and Future Work File Synchronization Admin Utility Overview

Figure: File Synchronization Admin Utility Overview

26 / 65 Data Synchronization for IITBombayX Platform File Synchronization Admin Utility Related Work and Background File Synchronization Admin Web Application Architecture And Design Android Application Architecture Implementation Use Case Conclusion and Future Work Android Application Architecture

1 An extra utility that can be configured to connect to the server. 2 Start controlling sync 3 View history of previous instances of sync process 4 Get notification for failed sync processes on the fly 5 gives flexibility as well as quick alerts to the admin

27 / 65 Data Synchronization for IITBombayX Platform File Synchronization Admin Utility Related Work and Background File Synchronization Admin Web Application Architecture And Design Android Application Architecture Implementation Use Case Conclusion and Future Work Android Application Architecture

1 Android application connects to the server through HttpClient library 2 Authenticates the user through Moodle or web application’s local database depending upon what user selects 3 All the data communication between Android application and the server is through RESTful service calls

28 / 65 Data Synchronization for IITBombayX Platform File Synchronization Admin Utility Related Work and Background File Synchronization Admin Web Application Architecture And Design Android Application Architecture Implementation Use Case Conclusion and Future Work Use Case Setup

1 Admin logs into the web application through local database or through Moodle. 2 Admin creates a list of colleges and servers within colleges along with their corresponding ip-addresses. 3 Admin selects each server one by one and creates directory pairs that need to be synched. 4 For each directory pair admin might select a period for regular sync process or may keep it to ’None’.

29 / 65 Data Synchronization for IITBombayX Platform File Synchronization Admin Utility Related Work and Background File Synchronization Admin Web Application Architecture And Design Android Application Architecture Implementation Use Case Conclusion and Future Work Use Case Use

1 Admin logs into the web application through local database or through Moodle. 2 Admin selects a college to work on 3 Admin selects a server within the college to work on 4 Admin sees list of local source and local destination directories. 5 Admin selects a local source or local destination directory to work on. 6 Admin selects a pair directory for selected local directory to work on.

30 / 65 Data Synchronization for IITBombayX Platform File Synchronization Admin Utility Related Work and Background File Synchronization Admin Web Application Architecture And Design Android Application Architecture Implementation Use Case Conclusion and Future Work Use Case Use

1 Admin can perform following actions • Select “View History”: This shows a list of timestamps when sync process took place. Admin selects one of the timestamps to see corresponding process’s logs. • Select “Sync Now”: When admin clicks “Sync Now” button, a RESTful service is called which starts a sync process on selected pair of directories only if there is no other synching process in progress for that pair of directories. • Select “Delete Pair”: When admin clicks “Delete Pair”, a popup appears to confirm if admin really wants to delete the pair. If admin selects “Cancel”, nothing happens and if admin selects “Ok”, a RESTful service is called which deletes the selected pair of directories. 2 Admin selects “Map View”: If admin selects “Map View”, they are taken to another page where they can see the servers on map and also they can see a blue stroke for servers where sync is in progress. This gives a better idea of synching servers to the admin on map. 3 Adming selects “Notifications Icon”: If admin selects “Notificatoins Icon”, a popup opens up which shows failed sync processes along with the servers and directories for which process failed. 4 Admin performs changes in server settings, like, Admin’s name, Admin’s password, web application’s port, etc. 31 / 65 Data Synchronization for IITBombayX Platform Related Work and Background File Synchronization Web Application Architecture And Design Android Application Implementation Implementation Technologies Used Conclusion and Future Work Outline

1 Data Synchronization for IITBombayX Platform File Synchronization Techniques Tools for File Synchronization IITBombayX Platform 2 Related Work and Background Related Work Background 3 Architecture And Design File Synchronization Admin Utility File Synchronization Admin Web Application Android Application Architecture Use Case 4 Implementation File Synchronization Web Application Android Application Implementation Technologies Used 5 Conclusion and Future Work

32 / 65 Data Synchronization for IITBombayX Platform Related Work and Background File Synchronization Web Application Architecture And Design Android Application Implementation Implementation Technologies Used Conclusion and Future Work Implementation Details

1 Whole application is divided into various modules 2 Perform dirrerent tasks and coordinate together in order to serve the purpose as a whole

33 / 65 Data Synchronization for IITBombayX Platform Related Work and Background File Synchronization Web Application Architecture And Design Android Application Implementation Implementation Technologies Used Conclusion and Future Work Authentication Module

1 This module contains two parts: 1 Local MySQL Database Authentication: User is authenticated from local MySQL database. 2 Moodle Authentication: User is authenticated from Moodle.

34 / 65 Data Synchronization for IITBombayX Platform Related Work and Background File Synchronization Web Application Architecture And Design Android Application Implementation Implementation Technologies Used Conclusion and Future Work File Synchronization Web Application

1 Reverse SSH Checker Module: This module serves the purpose of checking whether web application’s server can be connected from a remote server through SSH and opens an end point for web application’s front end as RESTful service. 2 Schedule Handler Module: This module takes the responsibility of regularly checking all directory pairs if they have a periodic synchronizaion set up for themselves and if it figures out that currently more time has elapsed than what the period for each directory pair is configured for, it starts a new sync process for corresponding directory pair. It also takes care that it does not start a sync process for already in-progress sync processes. 3 Sync Now Module: This module gives the flexibility to admin to select a pair of directories and start a sync process for that pair of directories straight away. It takes care that it does not start a sync process for already in-progress sync processes.

35 / 65 Data Synchronization for IITBombayX Platform Related Work and Background File Synchronization Web Application Architecture And Design Android Application Implementation Implementation Technologies Used Conclusion and Future Work

1 Check Availability Module: This module is responsible for checking if a directory is available locally or remotely as well. 2 Rsync command status Module: This module constantly checks status of a sync process which is in progress and if has been running for too long without generating any logs then it kills the process so that such dangling processes do not keep on clustering on the server which are waiting for say some input from user or are simply not responding for long time. 3 Notifications Module: This module is responsible for creating notifications in database which are then fetched by web application’s front end and displayed to the admin in Notification’s tray or shown in Android application as Notifications. 4 Map Module: This module is responsible for taking latitude-longitude information from database and passing it on to front end where a map is shown to the admin along with markers for each server. This also sends information about current in-progress sync processes which are displayed as blue strokes on the map on front end.

36 / 65 Data Synchronization for IITBombayX Platform Related Work and Background File Synchronization Web Application Architecture And Design Android Application Implementation Implementation Technologies Used Conclusion and Future Work Authentication Module

Figure: Authentication Module Architecture

37 / 65 Data Synchronization for IITBombayX Platform Related Work and Background File Synchronization Web Application Architecture And Design Android Application Implementation Implementation Technologies Used Conclusion and Future Work Synchronization Module

Figure: Synchronization Module Architecture

38 / 65 Data Synchronization for IITBombayX Platform Related Work and Background File Synchronization Web Application Architecture And Design Android Application Implementation Implementation Technologies Used Conclusion and Future Work Synchronization Module

1 Core module which starts a sync process and does book-keeping for the processes 2 Java creates a new process using Runtime.getRuntime().exec() command 3 it spawns a new process that calls a batch file which takes source and destination directory paths 4 Java module takes input stream of the batch file’s processes 5 whatever output of process is generated gets streamed into the java InputStream variable 6 created a new thread for every request that comes for synchronizing a directory, thus Ajax request is not blocked

39 / 65 Data Synchronization for IITBombayX Platform Related Work and Background File Synchronization Web Application Architecture And Design Android Application Implementation Implementation Technologies Used Conclusion and Future Work Sync History Module

1 Maintains history of all sync processes along with logs and timestamps 2 Maintains ’synchistory’ table in the database 3 Table is used by Sychronization Status Module in order to see which processes are in progress and which have not updated any log since long time (through ’lastUpdated’ column’s value) 4 It also contains a column ’isKilled’ which means that Synchronization Status Module killed corresponding process

40 / 65 Data Synchronization for IITBombayX Platform Related Work and Background File Synchronization Web Application Architecture And Design Android Application Implementation Implementation Technologies Used Conclusion and Future Work SSH Checker Module

1 Checks if a remote server is reachable through SSH from web application’s server 2 SSH-Reachable: Calling a server ’ssh-reachable’ if current server can communicate with remote server through ssh. This includes 3 parameters:

1 Current server should have ssh enabled. 2 Remote server should have ssh enabled. 3 Remote server’s public ssh key should be included in the ” /.ssh/authorized keys” file in current server.

3 ssh is required because ”rsync” works over ssh to communicate between master and slave servers 4 When a request for ”ssh-checker” comes, ”ssh-checker” spawns a new thread ”SshChecker” 5 ”SshChecker” thread launches a new process (through Runtime.getRuntime()) and establishes IPC to this process through InputStream and OutputStream

41 / 65 Data Synchronization for IITBombayX Platform Related Work and Background File Synchronization Web Application Architecture And Design Android Application Implementation Implementation Technologies Used Conclusion and Future Work SSH Checker Module

1 After spawning new thread, main thread of ”ssh-checker” calls join(int millis) method on this newly created thread. join() method makes the main thread to wait for child thread to finish its job. But if we specify ”millis” argument in join() method, then main thread will wait for only that many milliseconds. So, by default main thread is waiting for 10 seconds (this can be configured) for child thread to finish. 2 Child thread reads InputStream of spawned process which in turn calls ”ssh” on specified server. ”ssh” command does following: 1 If public ssh key of local server is included in authorized keys of remote server, then it gives some output on it’s output stream (which is generally displayed on terminal when it is called from terminal. 2 Java thread that spawned this process waits for at least 2 lines to appear in Process’s input stream. If lines appear, it can return that ssh connection is possible to remote server. 3 But if ssh key of local server is not included in the authorized keys of remote server, then ssh waits of user to enter password of remote server. And in such case, there is no output on the output stream of the process. Thus the child thread keeps waiting for input to come on process’s input stream. 4 Main thread after spawning the child thread waits for 10 seconds (configurable) to return. But if ssh is not able to connect, child thread cannot return and after 10 seconds main thread calls interrupt() method on child thread and returns 0, which means remote server cannot be reached 42 / 65 through ssh. Data Synchronization for IITBombayX Platform Related Work and Background File Synchronization Web Application Architecture And Design Android Application Implementation Implementation Technologies Used Conclusion and Future Work Reverse SSH Checker Module

Figure: Reverse SSH Checker Module Architecture

43 / 65 Data Synchronization for IITBombayX Platform Related Work and Background File Synchronization Web Application Architecture And Design Android Application Implementation Implementation Technologies Used Conclusion and Future Work Reverse SSH Checker Module

1 Handles scheduling of sync processes 2 It scans all the directory pairs which need to be synched. After getting the ids of all relations, it looks for latest entry of each id in ”synchistory” table. 3 For each entry in ”synchistory”, it checks if end time of the sync is NULL or not. If it is NULL, that means sync is still in progress. 4 If the end time value is not NULL, then it retrieves the end time value from ”synchistory” and matches it with the ”syncinterval” of current pair and finds out if current time has surpassed the interval at which it needed to sync current pair of directories. If yes, it launches a new thread which starts synching two directories, if not, it simply discards current pair and continues the same for other pairs of directories. 5 It performs these steps at an interval of 1 minute. This is accomplished by using Thread.sleep(60000) method on the ”ScheduleHandler” thread.

44 / 65 Data Synchronization for IITBombayX Platform Related Work and Background File Synchronization Web Application Architecture And Design Android Application Implementation Implementation Technologies Used Conclusion and Future Work Sync Now Module

1 Helps admin to start a sync process for a directory pair straight away 2 ”Sync Now” facility first checks if SSH connection to remote server is possible or not 3 If the connection is not possible, it does not try to sync two directories 4 If SSH connection to remote server is possible then it starts the sync process as normal through Synchronization Module

45 / 65 Data Synchronization for IITBombayX Platform Related Work and Background File Synchronization Web Application Architecture And Design Android Application Implementation Implementation Technologies Used Conclusion and Future Work Check Availability Module

Figure: Check Availability Module Architecture

46 / 65 Data Synchronization for IITBombayX Platform Related Work and Background File Synchronization Web Application Architecture And Design Android Application Implementation Implementation Technologies Used Conclusion and Future Work Rsync command status Module

1 Whenever a new process is initiated (for rsync), a new entry is added in the ”synchistory” table with a new id and ”syncstarttime” as current time and other entries null. 2 For every process that is initiated (for rsync), a new thread (PrcossStatusChecker Thread) is launched. 3 ”ProcessStatusChecker” Thread loops through and for every iteration (with a time delay of 10 seconds) it checks if the process’s ”syncendtime” entry exists in the ”synchistory” table or not. If it does that means the process has completed and it simply kills itself. 4 If ”syncendtime” entry does not exist that means the process is still running. 5 Rsync process launcher module keeps reading the outputs generated from the process and keeps updating the new line in ”synclog” column of ”synchistory” table for that process’s row. It also updates ”lastupdated” column to current timestamp so that we keep track of when was last input available from corresonding process. Rsync command’s property is that it keeps writing to standard output every second and updates file along with percentage complete. So, if a process is running successfully then ”lastupdated” column for that process will never cross 1 second or so from current time. 47 / 65 Data Synchronization for IITBombayX Platform Related Work and Background File Synchronization Web Application Architecture And Design Android Application Implementation Implementation Technologies Used Conclusion and Future Work Rsync command status Module

1 If ”ProcessStatusChecker” thread finds ”syncendtime” column null then it looks for ”lastupdated” column. If ”lastupdated” column is null that means no output has been given by the process yet. In such case it calculates the difference between current time and ”syncstarttime” and if it exceeds 120 seconds, it kills the process and itself along with making the entry for ”iskilled” column as ”true” in ”synchistory” table’s row for that process. 2 Similarly, if ”ProcessStatusChecker” thread finds ”syncendtime” column null and ”lastupdated” column is not null then it finds difference between current time and ”lastupdated” time and proceeds as defined in step 6. 3 At the end of process, p.exitValue() is checked which is 0 if the process is successful and non-zero if there was some error. ”Process Launcher” module checks exitValue() and if it is 0 then it sets ”processstatus” column’s value as ”Successful” in the ”synchistory” table’s corresponding row and sets the value to ”Failed: ¡exitValue¿” if exitValue() is not 0.endenumerate

48 / 65 Data Synchronization for IITBombayX Platform Related Work and Background File Synchronization Web Application Architecture And Design Android Application Implementation Implementation Technologies Used Conclusion and Future Work Notifications Module

1 Responsible for managing database for various notifications that need to be generated for failed sync processes 2 Whenever a process fails (rsync command fails and returns a non-zero exitValue()), an entry is made into the ”notificationinfo” table in the database

49 / 65 Data Synchronization for IITBombayX Platform Related Work and Background File Synchronization Web Application Architecture And Design Android Application Implementation Implementation Technologies Used Conclusion and Future Work Map Module

1 When user clicks on ”Map View” button on UI then manageMap.jsp is opened. 2 On this page user can see markers for each server that are configured in the database. 3 All the directories pairs for which sync is ongoing, a blue stroke is shown between the servers where they exist. This gives admin an idea about ongoing sync processes.

50 / 65 Data Synchronization for IITBombayX Platform Related Work and Background File Synchronization Web Application Architecture And Design Android Application Implementation Implementation Technologies Used Conclusion and Future Work Android Application

1 Authenticates user on the first screen (through web application’s local MySQL database or through Moodle). 2 After authenticating it shows all the colleges as a ListView that have been configured by admin in the database using Spring RESTful service call through HttpClient library of java. 3 When user selects a college, on the same activity user can see all the servers that have been configured by admin for that college in the database using Spring RESTful service call through HttpClient library of java. 4 After server selection user sees all the local directories for that particular server in a ListView. 5 Once user selects one of the local directories, another ListView below populates the paired directories for the selected directory in above ListView. 6 After user selects a remote directory, next activity is shown where user can see two buttons ”View History” and ”Sync Now”. 7 If user selects ”View History” button a ListView is populated below the buttons where user can see a list of all previous sync processes denoted by their start time. 51 / 65 Data Synchronization for IITBombayX Platform Related Work and Background File Synchronization Web Application Architecture And Design Android Application Implementation Implementation Technologies Used Conclusion and Future Work Android Application

1 When user selects one of the history tiemes from ListView he can see the logs for that sync process below the ListView. 2 If user selects ”Sync Now” button, a new sync process will be started for the selected directory pairs. 3 Background service keeps running and throws an HttpClient request to one of the RESTful services on the server 4 RESTful service returns all the failed sync history items 5 Notification Module generates a notification in the Notifications Pane of the Android device 6 This serves the purpose of notifying admin if a sync process fails

52 / 65 Data Synchronization for IITBombayX Platform Related Work and Background File Synchronization Web Application Architecture And Design Android Application Implementation Implementation Technologies Used Conclusion and Future Work Web Application Back End

1 MVC Architecture: Divides application into modules that take care of business logic(Model), controlling the flow(Controller) and user pages (View). Alternatives: Model-View-Presenter model is an alternative to MVC Architecture. Advantages: 1 Easy to manage database connection than normal web application. 2 Modular approach. 3 Does not decouple Presenter and View completely, unlike MVP architecture, so that the web application is more understandable for maintenance people. 4 Separates the responsibilities very nicely, IE., user experience on views, all business logic and database operations on model and flow control on controls.

Refs: [9]

53 / 65 Data Synchronization for IITBombayX Platform Related Work and Background File Synchronization Web Application Architecture And Design Android Application Implementation Implementation Technologies Used Conclusion and Future Work Web Application Back End

Spring Framework (MVC) Spring Framework gives support for MVC Architecture. Alternatives: 1 EJB (Enterprise Java Beans) was earlier used for implementing web application’s back end. 2 Struts Framework: Struts also provides a good framework for building web applications. But it enforces lot of constraints on implementation and application ends up getting stuck to using Struts all the way through with very less other options to explore. Advantages 1 Dependency Injection: Spring provides support for dependency injection so that various components are less dependent on each other. 2 It provides very clean distinction between Model, view and controllers. 3 It provides support for controllers as well as interceptors to control the behavior of flow of application much better and intercepting requests wherever needed. 4 XSLT, Velocity, etc. can be used instead of JSPs. 5 Unlike Struts, Spring does not enforce extending any class for Controllers. 6 Provides dependency injection for Models, IE., business logic. 54 / 65 Data Synchronization for IITBombayX Platform Related Work and Background File Synchronization Web Application Architecture And Design Android Application Implementation Implementation Technologies Used Conclusion and Future Work Web Application Back End

Above mentioned are the technologies that will be used for implementing back end of slave side. Apart from this, SSH key needs to be shared before RSync is called. Following are the steps to add SSH keys. 1 Run this command: ssh-keygen -t rsa This command generates the key pair. 2 Once the key pair is generated, we need to copy the key into master server’s authorized keys by copying the contents of id rsa.pub file into /.ssh/authorized keys file on master server.

55 / 65 Data Synchronization for IITBombayX Platform Related Work and Background File Synchronization Web Application Architecture And Design Android Application Implementation Implementation Technologies Used Conclusion and Future Work Web Application Front End

JQuery: JQuery is a very powerful library built on top of Javascript, which helps reduce LoC by good extent. Alternatives: There is no proper alternative to using Javascript for client-side scripting. But, plain Javascrip can be used to replace JQuery code completely. Advantages: 1 Lesser number of Lines of Code, so it is better to manage the code. 2 Built-in functions to ease coding efforts. 3 Very powerful regular expression based selection of elements in DOM. 4 Fits well with CSS.

56 / 65 Data Synchronization for IITBombayX Platform Related Work and Background File Synchronization Web Application Architecture And Design Android Application Implementation Implementation Technologies Used Conclusion and Future Work Web Application Front End

JQuery Widgets: They provide modular approach to client side scripting. Alternatives: Widgets provide modular structure to the web pages. Such modular structure is also supported at higher level by frameworks like Backbone.js. Advantages: 1 Easy to implement. They are not very heavy library as Backbone.js. 2 Provide modular approach to building components in the web page. 3 Have a well defined life cycle, including, create, init, destroy, and options for configuring values inside the widget.

57 / 65 Data Synchronization for IITBombayX Platform Related Work and Background File Synchronization Web Application Architecture And Design Android Application Implementation Implementation Technologies Used Conclusion and Future Work Web Application Front End

CSS: CSS is purely client side styling utility. Alternatives: CSS does not have proper alternative for itself, but styling can easily be embedded in the HTML (or JSP) page to remove CSS completely. Advantages: 1 It separates styling from web page making it more readable and manageable. 2 Fits well with JQuery. 3 Provides easily available animations that can be added in the page for better design.

58 / 65 Data Synchronization for IITBombayX Platform Related Work and Background File Synchronization Web Application Architecture And Design Android Application Implementation Implementation Technologies Used Conclusion and Future Work Web Application Front End

Bootstrap: Bootstrap is a free collection of tools which provide better UI to the HTML page. Advantages: Advantages of using Bootstrap are as follows: 1 Provides out of the box elements that can be used readily to build web page. 2 Divides the page into grid like structure, making it more manageable. 3 Provides media queries to make the web page responsive and make it behave properly on different devices and screen sizes.

59 / 65 Data Synchronization for IITBombayX Platform Related Work and Background Architecture And Design Implementation Conclusion and Future Work Outline

1 Data Synchronization for IITBombayX Platform File Synchronization Techniques Tools for File Synchronization IITBombayX Platform 2 Related Work and Background Related Work Background 3 Architecture And Design File Synchronization Admin Utility File Synchronization Admin Web Application Android Application Architecture Use Case 4 Implementation File Synchronization Web Application Android Application Implementation Technologies Used 5 Conclusion and Future Work

60 / 65 Data Synchronization for IITBombayX Platform Related Work and Background Architecture And Design Implementation Conclusion and Future Work Conclusion And Future Work

1 IITBombayX Platform provides for bringing faculties and students on common platform 2 Faculties offer courses which can be registered by students across the globe 3 If IITBombayX server serves all requests from users across the country, it may become bottleneck as the number of users increase, particularly at times when number of concurrent users may be very high 4 We can serve contents shared by faculties or students from all over the globe shared on their college servers by keeping those servers synced with IITBombayX server 5 This requires development of a File Synchronization Utility which assists admins to keep track and do the synchronization 6 For efficient use of bandwidth, RSync tool in Linux distributed will be used. Along with basic synchronization 7 Includes managing various servers, directories to be synced, viewing status of directories, viewing current progress of synchronization, periodic synchronization, etc.

61 / 65 Data Synchronization for IITBombayX Platform Related Work and Background Architecture And Design Implementation Conclusion and Future Work Future Work

1 Allow different roles to users so that student, faculty and admin have different privileges on the web application. 2 Allow students/faculties to upload videos/contents to specific directory on web application’s server directly from web applicaton or through Android device. 3 Allow admin to add server or edit servers from Map View. 4 Give better notifications on web UI for Administrator to look at the exact logs of failed sync process by directly clicking on the notification. 5 Authenticate user, faculty and administrator differently from Moodle using ’Roles’ in Moodle. Currently authentication only checks if given user is enrolled in Moodle or not. 6 Allow administrator to add/delete colleges, servers and directories pairs from Android application.

62 / 65 Data Synchronization for IITBombayX Platform Related Work and Background Architecture And Design Implementation Conclusion and Future Work Future Work

1 Sync Process Analytics: Allow administrator to view time taken by different sync processes and analytics on the data, like, average time taken by last 10 sync processes, top 10 most time consuming sync processes, etc. This will help administrator to monitor which directories are having too much of data, which directory pair is failing most number of times in sync process, etc. 2 Allow database configuration of various values in Android application and web application which are hardcoded. 3 Allow administrator to keep snapshots of directories which can then be loaded back in future. This goes towards versioning of directories where tools such as SVN, Git, etc can be used. 4 Map view on Android application.

63 / 65 Data Synchronization for IITBombayX Platform Related Work and Background Architecture And Design Implementation Conclusion and Future Work Questions

QUESTIONS?

64 / 65 Data Synchronization for IITBombayX Platform Related Work and Background Architecture And Design Implementation Conclusion and Future Work References I

T. S. Utku Irmak, Svilen Mihaylov, “Improved single-round protocols for remote file synchronization,” pp. 156–160, IEEE, 2008.

K. S. Deepak Gupta, “Remote file synchronization single-round algorithms, year=2010, publisher = International Journal of Computer Applications (0975 – 8887),,”

A. Tridgell and P. Mackerras, “The rsync algorithm,” 1996.

E. Platform. http://code.edx.org/,2013-2014.

Wikipedia. http://en.wikipedia.org/wiki/LuckyBackup.

L. Avgeriou. http://luckybackup.sourceforge.net/features.html.

Wikipedia. http://en.wikipedia.org/wiki/FlyBack.

opbyte. http://www.opbyte.it/grsync/#features.

Lijin. http://orangeslate.com/2006/11/10/12-benefits-of-spring-mvc-over-struts/,10-November,2006.

65 / 65