Implementation of distributed cloud system architecture using advanced container orchestration, cloud storage, and centralized database for a web-based platform

A thesis submitted to the Graduate School of the University of Cincinnati in partial fulfillment of the requirements for the degree of

Master of Science

In the Department of Electrical Engineering and Computer Science Of the College of Engineering and Applied Science

By Sohan Karkera Bachelor of Technology, University of Mumbai June 2018

Thesis Advisor and Committee Chair: Arthur J. Helmicki, Ph.D.

Committee Members: Victor J. Hunt, Ph.D. and Nan Niu, Ph.D.

University of Cincinnati, December 2020 Abstract

In the case of a distributed system based architecture, the cost to maintain an application might be higher, as the components of the system are distributed across various physical locations. Also, there is repeatedly an issue of latency between components that comes into effect. To resolve this issue cloud computing has played a significant role. Due to rapid technological advances being made, it has become more economical and easier to manage these components by substituting them with the various options available with cloud computing vendors which offer lower latency and dashboards for easy maintenance. Moreover, with the emergence of container- based and orchestration-based technologies, it has become possible to easily manage and deploy these applications by packaging them inside containers. This research focuses on using these advanced cloud computing, container and orchestration-based technologies to improve the scalability of an existing web appli- cation from a single-server-based architecture to a multi-server-based architecture. The application being considered in this research work is the Common Operating Platform (COP). For this, technologies like and Docker Swarm have been used for containerization and orchestration and Azure File storage has been used for the storage needs of the application. The database storage needs of the application have been fulfilled by the Oracle database. Apart from these, to improve and add some features to the application, we are using Python for scripting. Initially, the current architecture of the COP and its shortcomings are discussed. Then the solutions to those shortcomings are discussed via the proposed architecture. Under the implementation section, the multi-server architecture of the application is discussed initially. After that, the usage of centralized storage is discussed in detail. Ultimately, the other enhancements made to the existing architecture are discussed. After the implementation section, the final results can be observed in the results section. The idea over here is to tweak and improve the existing application archi- tecture to make it easy for scaling and managing by packaging the entire application code inside containers and make it a highly available and automated system.

i

Acknowledgements

I would like to take this opportunity to thank my professors, Dr. Arthur Helmicki and Dr. Victor Hunt, for their guidance and encouragement throughout this re- search project. I would also like to thank Dr. Nan Niu for taking time out of his busy schedule to serve on my thesis committee. A special thanks to the Ohio De- partment of Transportation (ODOT) for funding and providing necessary resources for the project. I would like to thank Mr. Niranjan Krishnan, Ms. Nikita Saraf, and Mr. Pranav Khekare for their help. I would like to acknowledge Mr. Aswin Subramanian, Ms. Sana Rajani, and Ms. Purva Gaikwad for helping the research by testing the application. Finally, I would like to thank my parents for supporting and encouraging me throughout this research.

ii Contents

1 Introduction 1 1.1 Common Operating Platform ...... 1 1.2 Problem Statement ...... 2 1.3 Motivation ...... 3 1.4 Existing System ...... 4 1.5 Proposed Approach ...... 5

2 Basic Concepts and Definitions 7 2.1 Container and Container Orchestration ...... 7 2.1.1 Docker[3] ...... 8 2.1.2 Docker Swarm[5] ...... 9 2.2 3D Modeling ...... 10 2.2.1 Pix4DEngine Server[13] ...... 10 2.2.2 OpenDroneMap[14] ...... 10 2.2.3 Point Clouds ...... 11 2.3 Microsoft Azure[22] ...... 11 2.3.1 Azure File Share[7] ...... 12 2.4 Web-server ...... 12 2.4.1 Nginx[24] ...... 13 2.5 Web Framework ...... 13 2.5.1 Django[27] ...... 14 2.6 Database ...... 15

iii CONTENTS

2.6.1 Oracle Database[8] ...... 15 2.7 Jenkins[11] ...... 15

3 System Design and Architecture 17 3.1 Existing System Architecture ...... 17 3.2 Proposed System Architecture ...... 21

4 Implementation of Proposed System 25 4.1 Multi-server Architecture ...... 25 4.1.1 Dockerization of Backend Module ...... 26 4.1.2 Centralized Storage ...... 31 4.1.3 Docker Compose file creation ...... 34 4.1.4 Automation of code by removing hard-coded values ...... 38 4.1.5 Oracle Database Connectivity ...... 41 4.2 Other enhancements ...... 42 4.2.1 Upload Issues ...... 42 4.2.2 Delete Automation ...... 43 4.2.3 Logging Server Metrics ...... 48

5 Installation of Proposed System 49 5.1 System Requirements ...... 49 5.1.1 Hardware ...... 49 5.1.2 Software ...... 50 5.1.3 Docker images ...... 50 5.2 Software Configurations ...... 51 5.2.1 Building Docker images for website, backend and Pix4D modules 51 5.2.2 Azure File Share ...... 52 5.2.3 Oracle Database ...... 52 5.2.4 Jenkins ...... 53 5.2.5 Manual Installation ...... 55

iv CONTENTS

6 Results 57 6.1 Multi Server Architecture ...... 57 6.2 Upload Limit ...... 59 6.3 Delete Automation ...... 60 6.4 Server Metric Logging ...... 60

7 Conclusion 62 7.1 Future Scope ...... 62

References 65

v List of Figures

1.1 Common Operating Platform[2] ...... 2

2.1 Docker Architecture[15] ...... 8 2.2 Virtual Machine Architecture[15] ...... 8 2.3 Docker Swarm Architecture[18] ...... 9 2.4 Point cloud based 3D Model ...... 11 2.5 Web Server Architecture[23] ...... 13 2.6 Django Model-Template-View (MTV) Architecture[29] ...... 14

3.1 Existing System Architecture[2] ...... 18 3.2 Proposed System Architecture ...... 22

4.1 Delete Automation Workflow ...... 43

6.1 Common Operating Platform (Proposed system) ...... 58 6.2 Processed Model (Proposed System) ...... 59 6.3 Snippet of delete automation workflow output on website container . 60 6.4 Snippet of delete automation workflow output on backend container . 60 6.5 RAM comparison ...... 61 6.6 CPU comparison ...... 61 6.7 Snipet of server metric logging workflow output ...... 61

vi List of Tables

6.1 Performance analysis of the uploads workflow ...... 59

vii Listings

4.1 Backend base Docker image ...... 26 4.2 Complete Backend Docker image ...... 27 4.3 Build Backend base image ...... 31 4.4 Build complete Backend image ...... 31 4.5 Cloudstor plugin install command ...... 32 4.6 Docker volume build command using Cloudstor plugin ...... 33 4.7 Running test container using Cloudstor based volume ...... 33 4.8 Bash script for copying files ...... 33 4.9 Subprocess function call ...... 34 4.10 Docker-compose file for COP ...... 34 4.11 Deploy Docker container stack for COP ...... 38 4.12 Initial snippet of settings,py(Website) ...... 38 4.13 Updated snippet of settings.py(Website) ...... 38 4.14 Initial snippet of settings.py(Backend) ...... 39 4.15 Updated snippet of pix4d.py ...... 39 4.16 Updated snippet of entwine.py ...... 40 4.17 Updated snippet of greyhound.py ...... 40 4.18 Updated database engine on dettings.py(Website) ...... 41 4.19 Updated client max body size variable snippet ...... 42 4.20 Updated keepalive, proxy read and proxy send timeout variables snip- pet...... 42 4.21 Supervisor configuration file ...... 44

viii LISTINGS

4.22 Snippet of delete.sh ...... 45 4.23 Snippet of file-del.py(Website) ...... 45 4.24 Snippet of file-del.py(Backend) ...... 46 4.25 Supervisor configuration file ...... 48 4.26 Code snippet of metric.sh ...... 48 5.1 Command to clone code repository ...... 51 5.2 Command to change directory ...... 51 5.3 Docker build command to create base images for website and backend containers ...... 51 5.4 Docker build command to create complete images for website and backend containers ...... 52 5.5 Docker build command to create Pix4D image ...... 52 5.6 Jenkins configuration script 1 ...... 54 5.7 Jenkins configuration script 2 ...... 54 5.8 Code snippet of start.sh script ...... 55 5.9 Code snippet of start.sh script ...... 56 6.1 Code snippet to view Docker stacks ...... 57 6.2 Code snippet to view Docker services ...... 57 6.3 Code snippet to view Docker containers ...... 58

ix Chapter 1

Introduction

Distributed system architecture is a type of computing architecture where the com- ponents of the architecture are physically located at different locations but to the end-user, they act as a single coherent system. Distributed cloud system architec- ture is using cloud-based technologies that are at different physical locations but communicate with each other to act as a single system.[1] This chapter discusses the Common Operating Platform[2], the problem statement and motivation for the research, and the basic concepts and definitions discussed throughout the work.

1.1 Common Operating Platform

The Common Operating Platform (COP)[2] is a web-based platform that allows the generation and visualization of 3D models. It also allows users to upload and share files. The application was developed by the University of Cincinnati Infrastructure Institute (UCII) for the Ohio Department of Transportation (ODOT). Before work- ing on this research work, it was a single-server-based web application consisting mainly of two components which are the website frontend and website backend. The application was based on the distributed system architecture. The application

1 1.2. PROBLEM STATEMENT code for the website container was packaged inside a Docker[3] container whereas, for the website backend, it resided locally on the server. Also, for the storage needs of the application, it used the local hard drive of the server and for the database storage, it used the PostgreSQL[4] database.

Figure 1.1: Common Operating Platform[2]

1.2 Problem Statement

Developing an application starts with the developer building it in their personal computer system. Once the application is ready to run, the developer pushes the code to a server. At the beginning of the application life cycle, the application needs are minuscule and hence, it is developed with a smaller set of resources. However, once the application is made open to end-users, the resource needs for the application increases. Also, the application needs to be constantly maintained so that it is highly available to the users. Hence, the scalability and maintenance of the application are one of the important aspects of a software’s lifecycle and should be focussed on while developing it.

In this research work, the application we are taking into consideration is the Common Operating Platform (COP)[2]. The initial application was developed using

2 1.3. MOTIVATION a distributed single server system architecture. The current proposed architecture is based on the distributed cloud system architecture. It mainly consisted of two mod- ules, the frontend interface for the users to interact with and the backend workflow that performs the computational processing[2]. Both the frontend interface code and backend interface code are packaged in Docker[3] containers. The rest of the software modules of the backend workflow are also packaged in Docker containers. To improve the application scalability and management, Docker Swarm[5] is used as the orchestration tool for the containers. It uses a Docker Compose[6] file to deploy the stack of containers. Since these containers are spawned on multiple servers, shared storage is essential and hence, Azure File Share[7] has been integrated to in- corporate shared storage and increase the upload capacity. The application needed to be integrated into the Ohio Department Of Transportation’s (ODOT) internal network and to fulfill one of their requirements, the database connectivity for the application has been shifted to the Oracle Database[8]. The existing issue regarding the deletion of files in the storage of the application has been resolved and discussed in detail. The application is now enabled to maintain server logs. For monitoring the resources being used on the servers, logging server metrics are enabled and displayed in this work.

1.3 Motivation

Distributed computing has been significantly improved with the technological strides that have occurred. The advancements made in cloud computing has improved the latency and scalability of the infrastructure used in various distributed systems. The advent of container-based and orchestration-based technologies has made it easier to package the application code and maintain the entire application as it provides a layer of isolation and keeps the application code environment consistent.[9] Since these advancements have been made quite recently, there is a technological gap be- tween applications observed today and the ones observed prior to the advancements

3 1.4. EXISTING SYSTEM made. It is necessary to upgrade the older applications with these new technologies as it will improve the maintenance and scalability and decrease the cost of buying expensive hardware. This research discusses the architecture improvements made to one such application named the Common Operating Platform (COP) allowing it to be easily deployed, managed and scaled as per increase in application users.

1.4 Existing System

The Common Operating Platform (COP)[2] is a dynamic web application created with the purpose to have a common interface for processing various workflows that need expensive computer hardware. The services provided by it are as follows:

• Process uploaded projects consisting of aerial imagery to produce densified point-cloud-based 3D models.

• View densified point-cloud-based 3D models on the application using a 3D model viewing software package Potree.[10]

• Upload, store and share multiple files up to 4GB in a given instance of time.

Even though the application served its purpose, there were some shortcom- ings discussed below:

• The architecture of the platform is designed in such a way that not all compo- nents are scalable and hence it cannot be scaled to a multi-server based system if the need arises.

• After server maintenance is performed on the servers running the application, it does not restart automatically. It needs to be restarted manually by logging on to the servers.

4 1.5. PROPOSED APPROACH

• The upload limit of 4GB makes it difficult to process models consisting of project datasets greater than 4GB even though the application is capable of processing such models.

• Files deleted from the application interface remain present on the file storage of the application.

• There is no tool to log the server metrics to monitor high system usage on the servers.

1.5 Proposed Approach

The purpose of this research work is to improve the scalability and performance of an existing system, which in this case is the Common Operating Platform (COP)[2], by making architectural changes and adding additional features to the application. The issues mentioned in the previous section are addressed below:

• The architectural changes are brought about by the addition of container or- chestration tool Docker Swarm[5], cloud storage Azure File Share[7] and Oracle Database[8]. This will allow the system to be a distributed cloud-based sys- tem that will be highly available, multi-server system compatible and easily scalable.

• Also, the use of container orchestration tool Docker Swarm along with the automation server tool Jenkins[11] allows the application to be automatically restarted in the event of a failure occurring.

• Integrating cloud storage based tool Azure file share and making changes to configuration files will help in increasing the upload limit.

• The addition of features like delete automation will help improve the perfor- mance of the application and help delete redundant files on the file storage.

5 1.5. PROPOSED APPROACH

• Server metric logging will help monitor the application throughout its lifecycle.

The core functionality of the application will be the same where the website interface will allow the users to upload files to the application and view 3D models and the backend module will process the uploaded images to produce 3D models. This website interface will be running using the Django[12] framework and will be packaged inside a Docker container. The backend module will be upgraded from the code being maintained on the server to it being packaged inside the Docker container. To process 3D models, the application will be using Pix4DEngine Server[13] and OpenDroneMap[14].

6 Chapter 2

Basic Concepts and Definitions

This section will cover all the basic concepts and definitions discussed throughout the research work.

2.1 Container and Container Orchestration

Container technology is a software technology that packages the code of the applica- tion along with its dependencies so that the application can run quickly and reliably from one system to the other.[15] Since containers have their engines and don’t need a Hypervisor, they are lightweight as compared to Virtual Machines. Container orchestration is the methodology to deploy, scale and manage these containers. It allows a system running with multiple containers to appear as a single system to users. It also performs redeployment of containers if they run into failures. Its pur- pose is to focus on managing the lifecycle of containers.[16] In this section, we will discuss Docker[3] and Docker Swarm[5] as the container and container-orchestration technologies respectively.

7 2.1. CONTAINER AND CONTAINER ORCHESTRATION

2.1.1 Docker[3]

Docker[3] is the most famous and widely used container based technology. Almost all of the cloud computing technology providers do provide access to Docker. It uses the Docker engine to run the containers. Since it is a container based technology, it creates lightweight and reliable containers which can be run on various systems which run different operating systems like Linux, Microsoft Windows and MacOS. It allows containers running on a particular operating system to run on a completely different operating system. Figure 2.1 shows the architecture of Docker and Figure 2.2 shows the architecture of a Hypervisor based virtual machine.

Figure 2.1: Docker Architec- Figure 2.2: Virtual Machine Architec- ture[15] ture[15]

As we can see clearly in the first figure that Docker runs on top of the operating system whereas, in the case of a virtual machine, it runs on top of a Hypervisor which in turn runs on top of an operating system. This allows Docker to run containers as isolated processes and hence, uses fewer system resources. To run their applications using Docker, software developers need to create Docker images of their application code. These images are used to create instances of the application in the form of containers. When these images are created, they are performed step- wise as mentioned in a Dockerfile[17] which is a configuration file for Docker to create images. Since it is performed each step at a time, the memory of each step is cached and hence, Docker decreases disk usage and increases the reusability of that step. This eventually results in faster build times for the images.

8 2.1. CONTAINER AND CONTAINER ORCHESTRATION

2.1.2 Docker Swarm[5]

Docker Swarm[5] is a container orchestration tool offered by Docker itself. Since it is promoted by Docker, it uses the same command line commands as Docker. Being a container orchestration tool, it manages and deploys containers at the lowest level of the architecture hierarchy. On a higher level, as the name suggests, it manages a group of Docker hosts to run as a single virtual host where it deploys containers. Figure 2.3 shows the architecture of Docker Swarm. As seen in the image, there can be multiple servers acting as managers that manage multiple worker servers. These worker servers are the Docker hosts that were mentioned previously. Each of them is a server running the Docker engine itself to run containers. If a container fails at one worker, it will be redeployed at another server. The advantage of this approach is that all the commands and APIs will work in clusters of Docker hosts the same way it will work in a single Docker host.

Figure 2.3: Docker Swarm Architecture[18]

Docker Swarm uses the Docker Compose[6] file as the configuration file for deploying an application consisting of many Docker containers. Each Docker container in this setup is referred to as a Docker Service[19]. It also manages the volumes attached to the containers and creates a network for the containers to interact with each other.

9 2.2. 3D MODELING

2.2 3D Modeling

3D modeling is the process of representing an object or surface using mathematical calculations in 3D space using specific software.[20] 3D models are maintained in files, and these files have different formats to store the data. There are many spe- cialized applications for generating 3D models of which, some are proprietary and the rest are open-source software. Here, Pix4dEngine[13] and OpenDroneMap[14] will be discussed along with one of the file formats 3D models are stored in called point clouds.

2.2.1 Pix4DEngine Server[13]

Pix4DEngine[13] Server is a proprietary software module created by Pix4D. It is based on the core engine which another application named Pix4DMapper[21] uses to generate 3D models. In order to use this software, an Application Programming Interface (API) has been created which allows the software developers to communi- cate with the engine. It is a highly customizable software module as it uses templates to create the desired output. It produces outputs in various file formats as per user needs.

2.2.2 OpenDroneMap[14]

OpenDroneMap[14] is an open-source software module that generates 3D models from aerial imagery datasets. They have a web-based software portal to process and visualize 3D models as well. Application programming interfaces in multiple coding languages are also available to use. A Docker container of the software already exists on the public Docker image repository.

10 2.3. MICROSOFT AZURE[MICROSOFT2016]

2.2.3 Point Clouds

Point clouds are collections of points that are mapped in 3D space to represent an object or surface 3D space. Most of the 3D processing software produce point clouds as the output of processing 3D models. The Figure 2.4 displays a 3D model in point cloud format. This is a model processed using aerial images from a drone. On zooming in on the model it can be observed that the model essentially consists of points in 3D space. To view the surface on 3D models, surface reconstruction using complex calculation needs to be done.

Figure 2.4: Point cloud based 3D Model

2.3 Microsoft Azure[22]

Microsoft Azure[22] is the cloud-technology platform from Microsoft. It provides a range of cloud-service-based products ranging in between software, platform and infrastructure. Microsoft announced the launch of Azure in 2008 and currently it offers about 200 product services.[22] In this chapter, we will go over one of their services under the storage domain named Azure File Share[7].

11 2.4. WEB-SERVER

2.3.1 Azure File Share[7]

Azure File[7] Share is one of the storage services offered by Microsoft Azure[22]. It provides a file-share-based storage system that can be integrated using the Azure Command Line Interface (CLI). For Docker containers, there is a Docker volume plugin available to mount the Azure File share as a Docker volume. Also for on- premise systems running on operating systems like Windows, macOS and Linux, it can be mounted as volumes. It uses the Server Message Block (SMB) and Network File System (NFS) protocols to communicate with the user’s systems.[7] One of the major advantages of using the Azure File Share is, it provides a shared storage service that can be integrated easily to an application having storage needs.

2.4 Web-server

A web-server is a computer system that runs web-based applications and handles HTTP-based requests. The term web-server has two meanings with respect to the context it is used in. Web-server as hardware is a computer system that hosts a web application and stores all the application code. Web-server as software is an HTTP server that can interpret URLs and communicate using the HTTP protocol to handle requests and serve data.[23] This data can be text or multimedia files. A web-server can host multiple web applications.

When a user tries to connect to a web application, they use a web URL on their browser to connect to the application. A Domain Name Server (DNS) then converts the URL to an IP address which is used as an identifier by the web-server. The purpose of the web-server is then to deliver the requested content to the user. The web-server acts as an entity between the client requests and the application code so that it can interpret the request and deliver it to the user by communicating with the application code.

12 2.5. WEB FRAMEWORK

Figure 2.5: Web Server Architecture[23]

2.4.1 Nginx[24]

Nginx[24] is one of the most popular web-server software used by a large number of web applications. It is an open-source web-server that handles HTTP requests and delivers application data which can be static or dynamic. In addition to working as a web-server, it can be configured to operate as a proxy server for email, a reverse proxy server and a load balancer to distribute incoming web traffic among different servers.[24] It is known for operating with low memory usage and higher request handling. It has a sophisticated event-driven architecture where a master process creates worker processes to manage and process network connections.[25]

2.5 Web Framework

A web framework is a collection of code libraries that helps in the easy development of reliable, scalable and manageable web applications.[26] It uses a definite pattern in coding style making it easier for the software developers to write and maintain the code. It also improves the complexity of identifying errors and bugs, as it follows a template for code to be written and makes it easy to find the origin of an error or bug. This template serves the web framework’s purpose of hiding the regular and repetitive code required to perform basic functionalities in a web application. These functionalities usually involve URL routing, input form management and validation,

13 2.5. WEB FRAMEWORK database connection and web security.[26]

2.5.1 Django[27]

Django[27] is an open-source Python web framework used for the development of web applications. It has many advantages over other web frameworks as it is simple to implement, fast to deploy and is highly secure and scalable for developing web applications. It is based on the Model-View-Template (MVT) architecture. It is similar to the Model-View-Controller (MVC) architecture but since Django does perform the tasks usually done by the controller by using templates, the architecture is slightly different.[28]

Figure 2.6: Django Model-Template-View (MTV) Architecture[29]

Figure 2.6 displays the architecture of Django. The model acts as the mediator between the website interface and the database. It manages and maintains the data for the web application. The view acts as a link between the Model and Template components of the architecture. It consists of the core logic of the code which handles all the functionalities of the application. The template consists of the user interface code which deals with how the application appears to the user. It essentially consists of static HTML and CSS code.[28] These three components provide a template for the developer to easily develop the code and deploy the application.

14 2.6. DATABASE

2.6 Database

A database is a collection of data that is organized systematically and can be elec- tronically stored as well as accessed using a computer. This database is managed by a Database Management System (DBMS) which is a software used to manage data. The database usually consists of a collection of files. The database contains multiple tables consisting of rows and columns of cells. Each of these cells is made up of some form of text data. To store, manipulate and retrieve this data from the database, a programming language called SQL is used. SQL is used to create queries which communicate with the database to perform operations on the data. Using databases allows web applications to store large amounts of data which can be retrieved in a matter of milliseconds as per user request.

2.6.1 Oracle Database[8]

Oracle database[8] is a Relational Database Management System (RDBMS) used to create and manage a database. It has an open-source as well as an enterprise edition of the software. It is one of the widely used database systems at the enterprise level. It is cross-platform and can run across multiple operating systems and it is ACID (Atomicity, Consistency, Isolation, Durability) compliant which makes it reliable and provides data integrity.[8]

2.7 Jenkins[11]

Jenkins is an automation server that is used to automate tasks related to building and deploying applications.[11] It is developed using the Java programming lan- guage. It is a Continuous Integration Continuous Deployment (CICD) tool and hence, allows developers to commit changes to the source code in a code repository

15 2.7. JENKINS[JENKINSDEVELOPERS2018] and build the application in a continuous workflow.[30] Each commit made to the code repository is monitored by it and whenever there is a commit made, it starts the deployment of the application. For this, it starts with the build for the appli- cation and deploys it once the build is successful. If the build fails, the errors are logged and the developers are updated about it. It is a very easy installation process and can be easily configured using its web interface. It’s an open-source tool and hence, has a lot of plugins available to use as per application requirement.[30]

16 Chapter 3

System Design and Architecture

In this chapter, we will discuss the previous and currently proposed architecture for the Common Operating Platform.

3.1 Existing System Architecture

The Common Operating Platform (COP)[2] is based on the microservices software development technique. In this technique, the system is made up of several services. Each one of these services performs a certain task which contributes towards the completion of a bigger task.[2] The COP mainly consists of two modules where each module can be further divided into smaller components. The website interface module provides users with an interface to interact with the application by uploading files and viewing 3D models. The backend processing module processes the uploaded files to create 3D models and the user cannot directly interact with this module. Figure 3.1 displays the previous architecture of COP.

17 3.1. EXISTING SYSTEM ARCHITECTURE

Figure 3.1: Existing System Architecture[2]

As observed in the figure, there are two major components: the website interface and the backend processing module. Apart from that, there are four other Docker containers namely HTTP server, Entwine[31], Pix4D[21] and ODM[14]. The website interface consists of the Application Programming Interface (API), Graph- ical User Interface (GUI) and Django[27] web framework. The backend processing module consists of the task pipeline, file storage, database server and backend con- troller. A summary of each of the components is listed below. For more detailed information please refer to the COP documentation.[2]

1. Application Programming Interface (API): The COP makes use of the Representational State Transfer (REST) API for communication between com- puter systems over a network.[2] The API contains all the operations the ap- plication can perform. The API primarily uses two HTTP methods namely GET and POST, and the output to the API queries is in the Javascript Object Notation (JSON) format. It also uses an API key for authentication which needs to be attached to the header of the HTTP request.[2] There are in total four APIs in the COP namely uploads, list directories, list files and download files.

18 3.1. EXISTING SYSTEM ARCHITECTURE

2. Graphical user interface (GUI): GUI provides the users with an interface via which they can access COP and its underlying APIs to perform tasks. The GUI allows users to use a virtual file system for uploading and maintaining files. This file system is based on the Linux file system.[2] The GUI provides users with username-password authentication and a sign-up page for registration. It also provides users options to create and delete files and folders to store on the application. For 3D model processing, it allows users to create specific folders using either Pix4DEngine Server[13] or OpenDroneMap[14] and upload files to them for processing. Status updates regarding processing are posted under the folder as well as the backend jobs tab. The backend jobs tab displays all the 3D processing projects created and processed, and maintains links to view and download files of the processed model. The last view provided by the GUI is the settings view. It allows users to change the password and create an API key used by the backend processing module, to authenticate access to the website interface.

3. Django[12]: Django web framework manages the entire website interface of the application. The API and GUI need to interact with the Django framework to perform tasks intended to be performed by them. The requests received from the API and the view change requests received from GUI are managed by Django. Django provides boilerplate code templates to handle basic operations like responding to HTTP requests and connecting to the database. As it is based on the MVT architecture, the template consists of the code necessary for the GUI representation, the model consists of code which details the database schema and the view consists of the code logic required to handle API-based HTTP requests depending on their type.

4. Unified Pipeline: The unified pipeline is responsible for maintaining the queue of tasks it needs to perform to process a 3D model. There are three sets of task queues which are for Pix4DEngine Server processing, OpenDroneMap processing and point cloud visualization. Each task queue consisted of five,

19 3.1. EXISTING SYSTEM ARCHITECTURE

four and two tasks respectively. These tasks are Python scripts located in the Backend Controller. Its purpose is to maintain the task queue.

5. File Storage: The local storage of the server is used as the file storage for the COP. It is used to store all the uploaded and processed 3D model files. Also, all the logs are stored in the file storage.

6. Database Server: PostgreSQL database is used as the database server for the COP. It is an open-source relational database system used to store data.[4] The database needs to be created first on the server and its credentials need to be given to Django, so that it can connect to the database and build the necessary tables for the application using the database schema mentioned in Django models.

7. Backend Controller: The backend controller as the name suggests controls the entire processing workflow of the application. It consists of worker pro- cesses that query the database to look for projects to process. Once it finds the project to process, it initiates the project as a task and populates all nec- essary values in the code. Using the unified pipeline, all tasks to be performed are populated and are sequentially invoked. The backend controller consists of all the process workflows which perform the tasks mentioned in the unified pipeline. After all tasks are performed, the database tables are updated with the status of the project and the user is notified on the website interface with the same status.

Out of these components, the first three belong to the website interface and hence, are maintained in a Docker container which we will refer to as the website container. Apart from this, there were four containers mentioned previously and a summary for each of them are mentioned below:

1. Pix4D: This container contains the Pix4DEngine server[13] and Python scripts necessary to interact with the engine. The project folder is mounted as a vol-

20 3.2. PROPOSED SYSTEM ARCHITECTURE

ume to the container. The ”images” folder in the project folder is taken as input and the processed 3D model is stored in an ”output” folder. All the logs are stored in the ”logs” folder inside the project folder.

2. Entwine: This container consists of the Entwine module which is a data organization library for massive point cloud files.[31] It performs the necessary pre-processing steps on the point cloud file, to be viewable on the application. It has two volumes mounted to it. The LAS file output from Pix4d container is mounted as input volume while the entwine directory in the file storage is mounted as the output volume.

3. ODM: This container consists of the OpenDroneMap[14] module. The entire project folder is mounted to this container. It takes images uploaded as input and stored the processed output in the same folder. The output is a collection of multiple folders. The output from this container needs no pre-processing steps to view on the application as the OpenDroneMap module performs it while processing.

4. HTTP server: This container consists of a basic HTTP web server. It is used to host the point cloud viewing application named Potree[10] Viewer. The application code for the Potree Viewer is mounted to this container.

All these components together make the Common Operating Platform. The architecture we discussed in this chapter is designed to run on a single server- based system. This is because only some of the components in this architecture are scalable.

3.2 Proposed System Architecture

As mentioned before, the existing system architecture of COP is designed to run on a single-server as all the components are not scalable. Also, there are limitations

21 3.2. PROPOSED SYSTEM ARCHITECTURE to the size of files uploaded and deletion of files on the file storage. To resolve these issues and make the application multi-server compatible, certain architectural changes were needed. The Figure 3.2 displays the new proposed architecture for the Common Operating Platform (COP)[2]. This architecture is based on the Contin- uous Integration Continuous Deployment (CICD) principle for microservice-based architecture applications.

Figure 3.2: Proposed System Architecture

As observed in the figure, the two modules known as the website inter- face and backend processor are tweaked a little and more components are added to the system design. The backend processing module is packaged into a Docker container. The file storage has been shifted from local storage to cloud-based Azure File Share[7]. The database has been updated from the Postgres[4] database to the Oracle database[8] based on ODOT IT team’s requirement. The additional compo- nents added are the load balancer, Team Foundation Server (TFS), Jenkins[11] and Docker Swarm Manager[5]. They are summarized in detail below:

1. Team Foundation Server (TFS): The TFS is a code repository that is used to store and maintain the source code of applications and manage the software development teams working on the development of their respective

22 3.2. PROPOSED SYSTEM ARCHITECTURE

applications. It is a source control tool mainly used for version control, issue tracking and application lifecycle management.[32] The entire source code of the application is now maintained here which would be used by the CICD automation tool Jenkins to deploy the application.

2. Jenkins[11]: Jenkins is a CICD-based automation server tool used for contin- uous deployment of applications by using custom scripts that take the snapshot of the available source code and continuously integrate them into the deployed application.[33] Once a change is made to the source code in TFS, Jenkins will be triggered due to which, it will use the custom scripts to take a snapshot of the new source code and deploy them on the servers.

3. Load Balancer: A load balancer is a server software used to redirect the requests to the application. It is used to manage a large number of requests made to an application. This is done by redirecting the requests to multiple copies of the same application. To scale an application into a multiserver based system, the application architecture must consist of a load balancer. The load balancer used in this setup is configured by the ODOT IT team and offered by Microsoft Azure. To link the load balancer with the application, its credentials are maintained in the Docker Compose configuration file which is used to deploy the application.

4. Docker Swarm Manager[5]: Docker Swarm manager is the container or- chestration software used to deploy the application containers. The manager node has access to all the worker nodes. It uses the Docker-compose file as the configuration for spawning the containers in the worker nodes. All the nec- essary variables and volumes mounted to the container are mentioned in this file. For the COP, the Docker Compose file is used to spawn two containers which are the website container and backend container since they need to keep running all the time. The other containers needed for processing and visual- ization are spawned by backend containers as per the task requirement. To

23 3.2. PROPOSED SYSTEM ARCHITECTURE

achieve this workflow of running Docker containers from a Docker container, the Docker out of Docker (DooD) architecture is used. The website container and backend container have been displayed on server 1 and server 2 respec- tively in the architecture diagram. However, the containers may be spawned in their respective opposite servers. Docker Swarm spawns the containers based on the availability of resources of the servers. Also, it monitors the health of the containers and respawns them if they are running into errors.

All these components together make the CICD approach possible for the application. Also, it allows the application to scale up on a multi-server configura- tion. The discussion made on the components in this chapter is just a high-level representation of them. Their intricacies and other components along with their im- plementation are discussed in detail in the next chapters. It also includes solutions to the existing underlying issues and other improvements made to the application.

24 Chapter 4

Implementation of Proposed System

This chapter is based on the discussion of all the steps needed to implement the architecture as proposed in the previous chapter. There will be a total of two sub-chapters namely multi-server architecture and other enhancements. The first chapter will discuss the intricacies and implementation of technologies discussed in the previous chapter. The second sub-chapter will discuss the underlying issues and its solutions along with the improvements made to the application.

4.1 Multi-server Architecture

In this chapter, the implementation of the multi-server architecture for the Common Operating Platform (COP)[2] is discussed in detail. Each step in the process is discussed as a detailed sub-chapter. There are a total of five sub-chapters which will be covered from here on.

25 4.1. MULTI-SERVER ARCHITECTURE

4.1.1 Dockerization of Backend Module

In this chapter, the packaging of the COP backend processing module in a Docker container is discussed in detail. The existing architecture of the COP backend processing module was designed to be local on the server the application is running. This was feasible as long as the application was scalable on a single server. However, as for any other application, it was necessary for the COP to be highly scalable without any restrictions and hence, be able to scale with a multi-server configuration. To achieve this, it is necessary to package the backend module in a Docker container. In order to create a Docker container, a Dockerfile needs to be created. This file is the configuration file to build the Docker container. To easily maintain the Docker image, the backend module was packaged using two Dockerfiles. Listing 4.1 and 4.2 display the two Dockerfiles.

FROM :16.04

MAINTAINER Sohan Karkera”[email protected]

RUN apt−get −y update RUN s l e e p 5

RUN apt−get −y −−ignore −missing install python python−pip \ libpq −dev vim curl git tzdata python3 python3−pip \ apt−transport −https \ ca−certificates \ gnupg−agent \ software −p r o p e r t i e s −common \ s u p e r v i s o r

# Correct timezone ENV TZ’America/New York’

RUN echo $TZ > /etc/timezone \ && ln −−symbolic −−force /usr/share/zoneinfo/$TZ /etc/localtime \ && dpkg−reconfigure −f noninteractive tzdata

26 4.1. MULTI-SERVER ARCHITECTURE

RUN c u r l −fsSL https://download.docker.com/linux/ubuntu/gpg | apt−key add − RUN add−apt−r e p o s i t o r y \ ”deb[arch=amd64] https://download.docker.com/linux/ubuntu \

$(lsb r e l e a s e −cs) \ s t a b l e”

RUN apt−get −y update RUN apt−get −y install docker−ce−c l i RUN pip install −−upgrade pip

RUN apt−get i n s t a l l −y cron a l i e n RUN mkdir −p /tmp/ bin ADD http://yum. oracle .com/repo/OracleLinux/OL7/oracle/instantclient/

x86 64/getPackage/oracle −instantclient19.3− basic −19.3.0.0.0 −1. x86 64.rpm /tmp/bin ADD http://yum. oracle .com/repo/OracleLinux/OL7/oracle/instantclient/

x86 64/getPackage/oracle −instantclient19.3− s q l p l u s −19.3.0.0.0 −1. x86 64.rpm /tmp/bin ADD http://yum. oracle .com/repo/OracleLinux/OL7/oracle/instantclient/

x86 64/getPackage/oracle −instantclient19.3− devel −19.3.0.0.0 −1. x86 64.rpm /tmp/bin

RUN a l i e n −i /tmp/bin/oracle −instantclient19.3− basic −19.3.0.0.0 −1. x86 64 . rpm

RUN a l i e n −i /tmp/bin/oracle −instantclient19.3− s q l p l u s −19.3.0.0.0 −1. x86 64 . rpm

RUN a l i e n −i /tmp/bin/oracle −instantclient19.3− devel −19.3.0.0.0 −1. x86 64 . rpm

RUN apt−get install libaio1 RUN python −m pip install cx Oracle==5.2

Listing 4.1: Backend base Docker image

FROM dockerdtrdev.dot. state .oh.us/odotdev/uas−base:backend 2

27 4.1. MULTI-SERVER ARCHITECTURE

RUN mkdir −p /opt/backend/backend controller/ /opt/backend/services/ / opt/backend/utils /uas

ADD b a c k e n d controller /opt/backend/backend controller/ ADD services /opt/backend/services/ ADD utils /opt/backend/utils

ADD run worker.sh /opt/backend/ ADD start greyhound.bat /opt/backend/ ADD start greyhound.py /opt/backend/ ADD s t a r t website .bat /opt/backend/ ADD s t a r t website .py /opt/backend/ ADD s t a r t worker.bat /opt/backend/

ADD f i l e −del.py /opt/backend/ ADD worker.py /opt/backend/

RUN chmod +x /opt/backend/run worker . sh

RUN pip3 install requests RUN pip3 install virtualenv

RUN virtualenv −p python3 /opt/backend/venv RUN . /opt/backend/venv/bin/activate && \ pip3 install docker requests && \ pip2 install supervisor requests docker && \ d e a c t i v a t e RUN pip3 install docker && \ pip2 install supervisor RUN pip2 install cx Oracle==5.2

ADD worker1.conf /etc/supervisor/conf.d/ ADD metrics.conf /etc/supervisor/conf.d/ ADD delete.conf /etc/supervisor/conf.d/

RUN mkdir −p /data/uas/logs/supervisor/ && touch /data/uas/logs/ supervisor/backend1.log && touch /data/uas/logs/greyhound.log && touch /data/uas/logs/supervisor/metrics backend.log

28 4.1. MULTI-SERVER ARCHITECTURE

WORKDIR /opt/backend/ ADD runner.sh runner.sh RUN chmod +x /opt/backend/runner.sh RUN chmod +x /opt/backend/services/copy.sh

RUN sed −i’s/ \ r//’ services/copy.sh RUN apt−get i n s t a l l −y s y s s t a t ADD metrics.sh /opt/backend/ ADD delete .sh /opt/backend/ RUN chmod +x /opt/backend/metrics.sh /opt/backend/delete .sh

CMD [”/bin/bash”,”/opt/backend/runner.sh”]

Listing 4.2: Complete Backend Docker image

To understand how the Dockerfile is interpreted, it is necessary to know the general syntax for instructions. The general syntax of instructions used for the Dockerfiles is listed below.[17]

1. FROM: It specifies the parent image used for building the container.

2. RUN: It executes any command specified on top of the current image and commits the results which will be used in the next step.

3. ENV: It is used to set environment variables.

4. ADD: It is used to copy new files and directories to the container image.

5. WORKDIR: It sets the working directory for all instructions mentioned after the command.

6. CMD: It is used to specify the default command the container will use on execution.

The first Dockerfile essentially maintains all the software modules needed

29 4.1. MULTI-SERVER ARCHITECTURE to be installed in the container. The important modules installed are mentioned below:

1. Python and pip (Version 2 and 3): These are system modules necessary to be installed in the container as almost the entire backend processing module is based on python scripts and python software modules.

2. Supervisor[34]: This is a system module used for running the worker processes.

3. Docker: To run docker containers from the container, Docker module needs to be installed.

4. Oracle instant client modules: To connect to the Oracle Database from the container command line, these modules need to be installed.

5. Libaio1: System module necessary to install cx Oracle Python[35] module.

The second Dockerfile is mainly used to copy the backend module code files and create the necessary directories in the container. Some python modules needed to run the backend module are also installed in this fille. They are listed below.

1. Virtualenv - It is a python module used to create virtual environments. Here it is used to create a virtualenv for running the backend module scripts.

2. Docker - It is the python module for Docker. It allows running docker com- mands using Python. It is mainly used in the Python scripts to invoke other Docker containers.

3. cx Oracle: It is a Python module used to connect to the Oracle database via Python scripts.

The first Dockerfile is used to create the base image consisting of the re- quired software modules and the second Dockerfile is used to add all the necessary

30 4.1. MULTI-SERVER ARCHITECTURE code files and other remaining software modules to complete the Docker image. This Docker image is then used to run the container. The command listed in the Listing 4.3 is used to create the base image. docker build −f /home/a50196227/UAS/Ubuntu−Base−Images/Dockerfile . backend −t dockerdtrdev.dot.state.oh.us/odotdev/uas−base:backend 2 .

Listing 4.3: Build Backend base image

After building the base image, the command listed in the Listing 4.4 is used to create the complete backend Docker image. After the Docker image is built, it can be used to spawn the backend container. docker build −t dockerdtrdev.dot.state.oh.us/odotdev/uas−backend .

Listing 4.4: Build complete Backend image

4.1.2 Centralized Storage

When an application is run in a multi-server based system, all the components must be well coordinated with each other. After the creation of the backend container, both the major modules of the COP[2] are in a Docker[3] container. Thus, both modules use the local storage based volume mounts. To run an application in a multi-server system, one of the biggest issues faced is the file storage of the appli- cation. Since we are running multiple containers which share the same set of files, it is necessary to have shared storage space in a multi-server based system. This is because if that is not the case, the containers will only be able to access files locally stored on the server they are running on. To use shared volumes with the COP application, two approaches were studied which are discussed in detail below.

1. Network File System (NFS): Network File System is a distributed file system protocol that is used to access files over a computer network.[36] A

31 4.1. MULTI-SERVER ARCHITECTURE

hard drive or multiple hard drives can be used to store files which can be accessed by the application as NFS based volumes. In order to use shared volume storage for the COP, NFS based volume mount was created by ODOT. To use the mount for COP, the volume was mounted to the containers, and the application was deployed. The NFS volumes work like local storage volumes and hence, it is mounted using the same method.

After mounting the NFS mount as a shared volume, the COP was successfully deployed. However, performance issues were observed after deploying it under this architecture. After testing the upload functionality of the application, it was noticed that the duration of time taken to copy files from one location to the other on the NFS mount was significantly long. This hampered the appli- cation as it was not feasible for the users to face longer wait times to process the project. To get around the issue another shared file storage based technol- ogy had to be adapted. Thus, cloud-based Azure File Share was considered and is discussed in detail in the next sub-chapter.

2. Azure File Share[7]: Azure File Share is one of the many storage-based products offered by Microsoft Azure[22]. As the name suggests, it is a file- share-based system that operates over the Azure cloud network. Since it is a cloud-based platform, it cannot be directly used as volume mounts to run with the container. To incorporate Azure File Share, a docker volume plugin named Cloudstor plugin[37] is needed to be installed on the server. This plugin will help Docker containers to directly use an Azure File Share account to create multiple volumes for storage. The Listing 4.5 shows the command to install the Cloudstor plugin. The AZURE STORAGE ACCOUNT KEY and AZURE STORAGE ACCOUNT variables need to be updated with the unique values for a given user.

docker plugin install −−a l i a s cloudstor:azure −− g r a n t a l l permissions docker4x/cloudstor:19.03.0 − c e a z u r e 1 CLOUD PLATFORM=AZURE AZURE STORAGE ACCOUNT KEY=””

32 4.1. MULTI-SERVER ARCHITECTURE

AZURE STORAGE ACCOUNT=”” DEBUG=1

Listing 4.5: Cloudstor plugin install command

After installing the plugin, many docker volumes can be created to mount on the Docker containers. The Listing 4.6 shows the command to create a sample docker volume using the Cloudstor plugin.

docker volume create −d cloudstor:azure test

Listing 4.6: Docker volume build command using Cloudstor plugin

This volume can be directly mounted to the containers just like a Docker volume. The Listing 4.7 shows an example of a container being run using an Azure File Share volume.

docker run −v test:/data ubuntu

Listing 4.7: Running test container using Cloudstor based volume

Just like the example based on the number of storage mounts needed, multiple Azure Files Share volume mounts need to be created and mounted onto the COP. The COP was successfully deployed using this Azure File Share storage option. However, instead of creating volumes manually, they were configured in the ”docker-compose” file which will be discussed in the next chapter. To get these volumes to work, a software issue had to be resolved. Error messages were recorded due to the usage of python based copy2 function to copy files. In order to resolve this issue a bash script was created to copy the files. This bash script was executed using the subprocess python module in place of the copy2 command to copy the files. Listing 4.8 and 4.9 show the bash script and the function call to execute the bash script in task.py of the backend module respectively.

#!/bin/sh

33 4.1. MULTI-SERVER ARCHITECTURE

cp −r $1 $2 e x i t0

Listing 4.8: Bash script for copying files

subprocess. call ([”/opt/backend/services/copy.sh”, os.path.join (self .uploads directory, afile), self.files directory])

Listing 4.9: Subprocess function call

4.1.3 Docker Compose file creation

Docker Compose is a configuration file used to spawn multiple containers together. In order to automate the deployment of the COP application as a whole, the creation of a Docker Compose file is needed. After the creation of the backend container, both the major modules of the COP are packaged in Docker containers. The creation of a docker-compose file will automate the deployment of the COP using the two containers. Listing 4.10 shows the docker-compose file for COP.

v e r s i o n :”3.7” s e r v i c e s :

uas−w e b s i t e : image: ${FRONTEND IMAGENAME} : ${FRONTEND IMAGETAG} e n v f i l e :

− ${APP ENV} . env volumes:

− logs:/opt/uas/logs:rw − tmp:/tmp:rw − uploads:/opt/uas/app/uploads:rw − media:/opt/uas/app/media:rw − utils:/opt/uas/utils:rw − p r o c e s s e d data:/opt/uas/app/backend controller:rw − mesh data:/opt/uas/app/mesh data:rw

34 4.1. MULTI-SERVER ARCHITECTURE

d e p l o y : r e p l i c a s : 1 placement: constraints:

− node. platform .os==linux l a b e l s :

com.docker.lb. hosts: ${UAS HOST} com.docker.lb.network: uas network com.docker.lb.port: 443 healthcheck:

t e s t : c u r l −− f a i l −s http://localhost:80/ | | e x i t 1 interval: 10s timeout: 10s r e t r i e s : 3 networks:

− uas network uas−backend: image: ${BACKEND IMAGENAME} : ${BACKEND IMAGETAG} e n v f i l e :

− ${APP ENV} . env volumes:

− /var/run/docker. sock:/var/run/docker.sock:rw − /input:/input:rw − entwine:/data/entwine:rw − logs:/data/logs:rw − media:/data/media:rw − odm:/data/odm:rw − pix4d:/data/pix4d:rw − p r o c e s s e d data:/data/processed d a t a : r w − uploads:/data/uploads:rw − utils:/data/utils:rw − mesh data:/data/mesh data:rw d e p l o y : placement:

35 4.1. MULTI-SERVER ARCHITECTURE

constraints:

− node. platform .os==linux healthcheck: test: supervisorctl status | grep RUNNING > /dev/null && exit 0 | | e x i t 1 interval: 10s timeout: 10s r e t r i e s : 3 networks:

− uas network volumes: entwine: driver: cloudstor:azure

l o g s : driver: cloudstor:azure

media: driver: cloudstor:azure

odm: driver: cloudstor:azure

pi x4d : driver: cloudstor:azure

p r o c e s s e d d a t a : driver: cloudstor:azure

u p l o a d s : driver: cloudstor:azure

u t i l s : driver: cloudstor:azure

36 4.1. MULTI-SERVER ARCHITECTURE

tmp: driver: cloudstor:azure

mesh data: driver: cloudstor:azure networks: uas network:

Listing 4.10: Docker-compose file for COP

The Docker Compose file will be thoroughly discussed in this section. The Docker Compose file is maintained in the YAML Ain’t a Markup Language (YAML) file format. The “version” tag specifies the version of docker-compose to be used in this configuration. The “services” tag must contain the details regarding the docker containers to run. Docker Compose allows the containers to be run as Docker services which makes it possible to control the configuration for the container and to spawn it in a multiserver system. In the Docker Compose file mentioned above, the two containers are configured as two services named “uas-website” and “uas- backend”. The ”image” tag is used to specify the Docker image to be used for the container. The ”env file” tag is used to specify the environment file to be passed to the container. The ”volumes” tag is used for specifying all the volumes to be mounted on the container. The other use of this tag is to create volumes that need to be mounted. This is mentioned at the bottom of the docker-compose file. Also since we are using the Cloudstor volume plugin for Azure file share, the ”driver” tag is used to specify it under the volumes creation tag. The ”deploy” tag is used to specify all configurations to deploy the application. Over here the replicas tag is used to mention the number of copies of the containers to run. The ”labels” tag is used to specify the load balancer configuration. The ”healthcheck” tag is used to perform a health check on the container and respawn it if it runs into an error. The ”networks” tag is used to specify the name of the custom network used by the stack

37 4.1. MULTI-SERVER ARCHITECTURE and services running under it.

In order to use the file, the docker-compose command to deploy a stack of containers needs to be used. The Listing 4.11 shows the code to start the stack of containers. docker stack deploy −c docker−compose.yml uas

Listing 4.11: Deploy Docker container stack for COP

After creating the Docker Compose file and deploying it as a stack using Docker Swarm, the swarm manager handles the containers which are to be deployed on the multiple worker nodes it has access to.

4.1.4 Automation of code by removing hard-coded values

The COP was successfully deployed using this methodology, but subsequent errors arose in the processing pipeline. It was observed that due to some pieces of code having a hard-coded value, the processing pipeline was broken. In this chapter, we will discuss the removal of all hard-coded values.

GREYHOUND SERVER =’

Listing 4.12: Initial snippet of settings,py(Website) b a c k e n d h o s t f i l e=’/opt/uas/logs/backend host’ while not os.path.exists(backend h o s t f i l e ) : time.sleep(5) f = open(backend h o s t f i l e ,’r’)

GREYHOUND SERVER= f . read ( ) f . c l o s e ( )

Listing 4.13: Updated snippet of settings.py(Website)

38 4.1. MULTI-SERVER ARCHITECTURE

Listings 4.12 and 4.13 show the initial and updated code in settings.py file in the website container. Initially, the value for the greyhound server hostname was mentioned at the time of deploying the application. But since it is automated and dynamic over a multi-server system, the value had to be set when the backend container has started running and the value of the server it’s running on can be obtained. api key = os.environ[’API KEY’]

# PIX4D credentials username = os.environ[’PIX4D USER’] password = os.environ[’PIX4D PASSWD’]

Listing 4.14: Initial snippet of settings.py(Backend)

Listing 4.14 shows the updated values for variables api key, username and password in the backend container’s settings.py file. The initial values were hard- coded and it worked before since the backend module was not in a container. After the creation of the backend container, these values could not have been kept hard- coded and hence, they were replaced to use environment variables that are passed during runtime to the container.

All the directory volumes mounted on to each container spawned by the backend controller were hard-coded. Due to this, the new Azure file share volumes could not get mounted and processing would run into errors. Each of these values was updated to dynamically get the name of the volume they need to mount. Below listings display these changes made to the pix4d.py, entwine.py and greyhound.py. self . client .containers.run(image=’connormanning/entwine’, name=container name , command=command , volumes={ ’/input’: { ’bind’:’/data’, ’mode’:’rw’ } ,

39 4.1. MULTI-SERVER ARCHITECTURE

’’.join([os.environ[’ STACK NAME’],’ entwine’]): { ’bind’:’/entwine/’, ’mode’:’rw’ } })

Listing 4.15: Updated snippet of pix4d.py self . client .containers.run(image=self .image, name=container name , volumes={ ’’.join([os.environ[’ STACK NAME’],’ pix4d’]): { ’bind’:’’.join([self. c o n t a i n e r d i r ] ) , ’mode’:’rw’ } } , environment={ ’EMAIL ADDRESS’: self. username , ’PASSWORD’: self.password, ’PROJECT NAME’: self. task name })

Listing 4.16: Updated snippet of entwine.py self . client .containers.run(image=’connormanning/http −s e r v e r’, detach= True , name=greyhound name , ports={ ’8080’: os.getenv(’ GREYHOUND PORT’) } , volumes={ ’’.join([os.environ[’ STACK NAME’],’ entwine’]): { ’bind’:’/var/www’, ’mode’:’rw’ }

40 4.1. MULTI-SERVER ARCHITECTURE

})

Listing 4.17: Updated snippet of greyhound.py

All these code updates resolved the errors and the COP processing pipeline was tested successfully.

4.1.5 Oracle Database Connectivity

The COP initially used the PostgreSQL[4] database which was an open-source database. However, the ODOT IT team didn’t support its usage, as they pro- vided support for using the Oracle Database[8]. Thus, the COP had to be upgraded from the PostgreSQL Database to the Oracle database. In order to achieve this, it was necessary to first install the necessary Oracle packages. Listing shows the list of Oracle modules installed inside the website container. The Oracle basic client mod- ules consist of the basic, sqlplus and developer packages. They allow command-line connectivity to the Oracle database. Since Django creates the database tables using the schema mentioned in its models component, the database engine was the only thing to change. The Listing 4.18 shows the updated database engine value for the Oracle Database.

DATABASES = { ’default’: { ’ENGINE’:’django.db.backends.oracle’,

’NAME’:’ ’, ’USER’:’ ’, ’PASSWORD’:’ ’, } }

Listing 4.18: Updated database engine on dettings.py(Website)

After implementing these steps Oracle Database was successfully integrated with the COP.

41 4.2. OTHER ENHANCEMENTS

4.2 Other enhancements

Apart from the enhancements made for the multi-server architecture system, there were some other enhancements made to the system. They are discussed in detail in this chapter.

4.2.1 Upload Issues

After shifting the storage from the local storage to the cloud-based storage the up- load cap of 4GB persisted. The COP used the Nginx as the webserver to handle all the requests and serve files to the users. It also managed all the networking configu- rations with regards to data uploaded to the application and timeout values for the connections being made. In order to increase the uploads, the following changes are made to the Nginx configuration files for the application. In the uas nginx dev.conf file, the clien max body size variable is increased from 4GB to 10GB. In the ng- inx.conf file, the keepalive timeout, proxy read timeout, and proxy send timeout variables are increased to 1800 seconds. Listings 4.19 and 4.20 display the code changes made to uas nginx dev.conf and nginx.conf.

c l i e n t m a x b o d y s i z e 10G;

Listing 4.19: Updated client max body size variable snippet

k e e p a l i v e timeout 1800s; proxy read timeout 1800s; proxy send timeout 1800s;

Listing 4.20: Updated keepalive, proxy read and proxy send timeout variables snippet

Initially, the changes resulted in success but on further testing, upload issues were observed. When the user uploaded a large dataset, the page would refresh after the upload bar reached 90% and a fraction of files were showing up. In

42 4.2. OTHER ENHANCEMENTS the browser console window, a “Failed to load resource: the server responded with a status of 504” showed up every time the error was observed. After multiple refreshes, all the uploaded files did appear. In order to resolve this, an extensive study was done with respect to each module of code related to the uploads. After the study, the cause of the issue was discovered in the load balancer. The load balancer runs on an Nginx web server and hence has its own configuration. In order to observe uploads with any errors, it is necessary to keep the time out values the same in both the load balancer and application Nginx configuration files. After notifying the ODOT personnel, the load balancer configuration file was updated with the same timeout values mentioned in the nginx.conf file of the application. This resolved the issue and the upload capacity is currently increased to 10GB.

4.2.2 Delete Automation

One of the underlying issues of the application was its inability to delete files from the file storage even after the files are deleted on the application. This led to the file storage being completely utilized rendering the COP unable to process anything. In order to resolve this issue, a delete automation process workflow is proposed.

Figure 4.1: Delete Automation Workflow

As per the workflow, a supervisor based worker process will be generated, which will invoke the bash script delete.sh repeatedly. This bash script will control the duration of the time taken to invoke the workflow and execute a python script named file-del.py. This python script consists of the algorithm to delete files which are already deleted on the application. Since the website container doesn’t have access to all the storage directories of the application, the workflow is maintained on both the website and backend containers. Both the supervisor configuration and

43 4.2. OTHER ENHANCEMENTS delete bash script are the same on both containers. The file-del.py script differs on both of them. The algorithm on the website container is designed such that it first runs a query on the database to find all the files available on the website and store it as a list. Using the ”os” python module, it will run the list command to find all the files available in the file storage and store it as another list. Finally, it will compare the two lists and delete those files which are in the second list but not in the first list. In order to maintain a log of the process, it will print the name of each file deleted, the total number of files deleted and the total files uploaded.

The algorithm on the backend container differs from a couple of things. Instead of maintaining a list of files, it maintains a list of project folders on the COP. It compares this list with another list that consists of projects on the server. Since the projects can either be Pix4d based projects or just 3D model visualization based projects, two checks are done. For the first case, the list from the COP is checked with the list of projects under the pix4d directory in the file storage and delete those projects which are in the second list and missing in the first list. It will also check to delete files from two additional directories namely entwine and processed data. For the second case, the first list stays the same but the second list consists of a list of projects in the entwine directory. The principle of checking the two lists is the same. After deletion, it prints the count of processed and deleted Pix4d and custom projects. Listing 4.21, 4.22, 4.23 and 4.24 displays the supervisord.conf file, delete.sh file and file-del.py for both website and backend modules.

[program:delete] command = /opt/backend/delete.sh ; Command to start app user = root ; User to run as s t d o u t logfile = /data/logs/supervisor −backend.log ; Where to write log messages autostart = True autorestart = True r e d i r e c t stderr = true ; Save stderr in the same log environment=LANG=en US .UTF−8,LC ALL=en US .UTF−8 ; Set UTF−8 as d e f a u l t

44 4.2. OTHER ENHANCEMENTS

encoding

Listing 4.21: Supervisor configuration file

#!/usr/bin/env bash n=900 s l e e p ${n} exec python /opt/backend/file −d e l . py

Listing 4.22: Snippet of delete.sh import cx Oracle import os from datetime import datetime now = datetime.now() p r i n t(”[ {} ]”.format(now))

conn = cx Oracle.connect(user=os.getenv(’PSQL USER’) , password=os. getenv (’PSQL PASSWORD’), dsn=os.getenv(’PSQL DB’)) b a s e d i r=’/opt/uas/app/uploads/’ uploads=os. listdir(base d i r ) c = conn.cursor() c . execute (”select docfile from dashboard uploadedfiles”) query =[] d e l e t e =[] query=c. fetchall () count=0 countf=0 f o r row in range(0,len(query)): query [row]=str(query[row][0]) f o ri in range(0,len(uploads)): uploads[ i]=’’.join([base dir ,uploads[i ]]) p r i n t(len(uploads)) d e l e t e= list(set(uploads) −s e t(query))

45 4.2. OTHER ENHANCEMENTS

f o r row in range(0,len(query)): query [row]=str(query[row][0]) f o ri in range(0,len(uploads)): uploads[ i]=’’.join([base dir ,uploads[i ]]) p r i n t(len(uploads)) d e l e t e= list(set(uploads) −s e t(query)) f o r dire in delete: i f os.path.exists(dire): countf+=1 os.remove(dire) p r i n t(” {} d e l e t e d”.format(dire)) p r i n t(”Count of total uploaded files:”,len(uploads) −countf ) p r i n t(”Count of deleted files:”,countf)

Listing 4.23: Snippet of file-del.py(Website) f o r dire in pix4d: ind=False f o r row in query2:

i f os.path.exists(’’.join([basicp ,row[0]])): #print(row[0]) i f dire==row[0]: count+=1 ind=True break e l s e: continue i f ind==False: countf+=1 i f os.path.exists(’’.join([basicp ,dire])): rmtree (’’.join([basicp ,dire])) p r i n t(” {} d e l e t e d”.format(’’.join([basicp ,dire ]))) i f os.path.exists(’’.join([basicpr ,dire ,’.zip’])): os . remove (’’.join([basicpr ,dire ,’.zip’]))

46 4.2. OTHER ENHANCEMENTS

p r i n t(” {} d e l e t e d”.format(’’.join([basicpr ,dire ,’.zip’]))) i f os.path.exists(’’.join([basice ,dire])): rmtree (’’.join([basice ,dire])) p r i n t(” {} d e l e t e d”.format(’’.join([basice ,dire ]))) f o r dire in entwine: ind=False f o r row in query2:

i f os.path.exists(’’.join([basice ,row[0]])): #print(row[0]) i f dire==row[0]: i f”custom” in dire: counte+=1 ind=True break e l s e: continue i f ind==False: countfe+=1 i f os.path.exists(’’.join([basice ,dire])): rmtree (’’.join([basice ,dire])) p r i n t(” {} d e l e t e d”.format(’’.join([basice ,dire ]))) p r i n t(”Count of processed Pix4d projects:”,count) p r i n t(”Count of deleted processed Pix4d projects:”,countf) p r i n t(”Count of custom projects:”,counte) p r i n t(”Count of deleted custom projects:”,countfe)

Listing 4.24: Snippet of file-del.py(Backend)

47 4.2. OTHER ENHANCEMENTS

4.2.3 Logging Server Metrics

In order to improve the utilization of resources, it is necessary to maintain logs of resource utilization. When the COP is running and processing projects, it utilizes most of the system resources. Due to this, high CPU utilization is observed on the servers. In order to monitor Pix4d and any other process running on the servers, a server metric logging system is developed and integrated with the COP. This system uses a workflow similar to the delete automation system. A supervisor based worker process is configured to run a bash script metrics.sh. This bash script prints the date, system RAM usage and CPU usage for the top 25 processes. Listing 4.25 and 4.26 shows the supervisor configuration file and the bash script respectively.

[program:metrics] command = /opt/backend/metrics.sh ; Command to start app user = root ; User to run as s t d o u t logfile = /data/logs/supervisor/metrics backend.log ; Where to write log messages autostart = True autorestart = True r e d i r e c t stderr = true ; Save stderr in the same log environment=LANG=en US .UTF−8,LC ALL=en US .UTF−8 ; Set UTF−8 as d e f a u l t encoding

Listing 4.25: Supervisor configuration file

#!/bin/bash docker ps | grep backend n=1 s l e e p ${n} date f r e e −h ps −eo pid ,ppid ,cmd,%cpu −−s o r t=−%cpu | head −n 25 echo””

Listing 4.26: Code snippet of metric.sh

48 Chapter 5

Installation of Proposed System

In this chapter, we will discuss the system requirements and the software configura- tions required to install the proposed system.

5.1 System Requirements

The requirements mentioned in this section are necessary for the installation of the proposed application software.

5.1.1 Hardware

1. CPU: Intel(R) Xeon(R) CPU E5-2690 v3 @ 2.60GHz

2. RAM: 110GB

3. Storage: Azure File Share

4. Disk Space: 100GB- 1TB (per volume)

49 5.1. SYSTEM REQUIREMENTS

5.1.2 Software

1. Operating System: RedHat Enterprise Linux 7.6

2. Docker v19.x or higher

3. Docker-compose v 1.x or higher

4. Cloudstor docker volume plugin v18.x or higher

5. Oracle database 12c or higher

6. Jenkins v2.x or higher (Optional)

5.1.3 Docker images

1. opendronemap/opendronemap:v0.3.1 (pull from docker-hub)

2. connormanning/entwine (pull from docker-hub)

3. connormanning/http-server (pull from docker-hub)

4. docker-pix4d (build from source, see upcoming sections for more information)

5. common operating platform uas-website (build from source, see upcoming sec- tions for more information)

6. common operating platform uas-backend (build from source, see upcoming sections for more information)

50 5.2. SOFTWARE CONFIGURATIONS

5.2 Software Configurations

5.2.1 Building Docker images for website, backend and Pix4D

modules

In order to start installing the application, the basic docker images for website, backend and Pix4D need to be present on the server. To start with, the application code from the TFS code repository needs to be cloned and credentials need to be mentioned to gain access. git clone http://tfs.dot.state.oh.us:8080/tfs/DoIT/ g i t /UAS

Listing 5.1: Command to clone code repository

After cloning, go to the ”Ubuntu base images” folder inside the cloned repository. cd UAS/Ubuntu−Base−Images

Listing 5.2: Command to change directory

Build both the base Docker images for website and backend modules using the ”docker build” command. docker build −f /home/a50196227/UAS/Ubuntu−Base−Images/Dockerfile . website −t dockerdtrdev.dot.state.oh.us/odotdev/uas−base:website 2 .

docker build −f /home/a50196227/UAS/Ubuntu−Base−Images/Dockerfile . backend −t dockerdtrdev.dot.state.oh.us/odotdev/uas−base:backend 2 .

Listing 5.3: Docker build command to create base images for website and backend containers

For building the complete website and backend Docker containers, execute the commands listed below from their respective folder directories on the system

51 5.2. SOFTWARE CONFIGURATIONS command line. These commands should only be executed if the application is being deployed manually without using Jenkins. docker build −t dockerdtrdev.dot.state.oh.us/odotdev/uas−website . docker build −t dockerdtrdev.dot.state.oh.us/odotdev/uas−backend .

Listing 5.4: Docker build command to create complete images for website and backend containers

For the Pix4d container go to ”docker-pix4d” directory. Create a folder called “bin” inside the directory and copy the “pix4dengine-0.1.0-py3-none-any.whl” and ”pix4dmapper 4.3.31 amd64.deb” files within.[2] Run the ”docker build” com- mand to build a Docker image for Pix4D from the ”docker-pix4d”directory. docker build −t docker−pix4d .

Listing 5.5: Docker build command to create Pix4D image

Finally push the images to the Docker Trusted Registry(DTR) to be ac- cessible to Jenkins.

5.2.2 Azure File Share

Create an account on Microsoft Azure and create a storage account to use Azure File share. While creating the storage account, disable the ”Secure transfer required” op- tion in order to install the Cloudstor plugin. Make a note of the account credentials to use it in the command for installing the Cloudstor plugin.

5.2.3 Oracle Database

Install the Oracle database based on the documentation available[38] or create a cloud-based Oracle database installed Virtual Machine on any of the cloud-technology-

52 5.2. SOFTWARE CONFIGURATIONS based platforms. After the database is created, create a user with specific privileges as mentioned in the documentation for the Common Operating Platform.[39]

5.2.4 Jenkins

If the application needs to be deployed in a CICD-based approach using Jenkins, then the following steps need to be performed:

1. Create a ”Folder” by clicking on the ”New Item” and name the folder.

2. Create a ”Multi Configuration Project” by clicking on the ”New Item” and name the folder.

3. A configuration page will be loaded. Under the first section, select ”Promote builds when...” option to create workflows for development and production builds if needed.

4. Under Build Triggers, select the ”Build when a change is pushed to TFS/Team Services” option.

5. Under source code management, select Git and mention the TFS code repos- itory link.

6. Under Build Environment, select the ”Use secret text(s) or file(s)” option.

7. Under bindings, three bindings need to be created which are the username and password for Docker Trusted Registry (DTR), the UCP bundle zip file and the environment file needed for the build.

8. Under build, two bash scripts need to be created. The first one will build the docker images and push them to the DTR. The second one is used to deploy the application. Listing 5.6 and 5.7 display the two bash scripts in respective order.

53 5.2. SOFTWARE CONFIGURATIONS

9. After setting up Jenkins the developer needs to hit on the build button on the project homepage to build and deploy the application. export DTR SERVER=dockerdtrdev .dot. state .oh.us export APP ENV=dev echo ${DTR PASSWORD} | docker login −u ${DTR USER} −−password−s t d i n ${ DTR SERVER}

## Build Backend controller image cd UAS−Backend−C o n t r o l l e r docker build −t ${DTR SERVER}/ odot$ {APP ENV}/ uas−backend : ${BUILD NUMBER } . cd − ## Build Website image cd UAS−Website docker build −t ${DTR SERVER}/ odot$ {APP ENV}/ uas−website : ${BUILD NUMBER } .

# Push Images to DTR docker image push ${DTR SERVER}/ odot$ {APP ENV}/ uas−website : ${ BUILD NUMBER} docker image push ${DTR SERVER}/ odot$ {APP ENV}/ uas−backend : ${ BUILD NUMBER} echo”Finished the build and pushed the image to DTR.”

Listing 5.6: Jenkins configuration script 1 echo”Starting the deployment of the build to Development environment on‘date‘.” export DTR URL=dockerdtrdev.dot. state .oh.us export FRONTEND IMAGENAME=${DTR URL}/odotdev/uas−website export FRONTEND IMAGETAG=${BUILD NUMBER} export BACKEND IMAGENAME=${DTR URL}/odotdev/uas−backend export BACKEND IMAGETAG=${BUILD NUMBER} export APP ENV=dev export UAS HOST=uasdev.dot. state .oh.us

54 5.2. SOFTWARE CONFIGURATIONS

export NFS MOUNT=/uas /${APP ENV}/ uasdev echo”Started the deployment of uas stack” l s −l cat $DEV ENV > ${APP ENV} .env l s −l cat docker−compose.yml cat ${APP ENV} .env echo”UCP BUNDLE=${UCP BUNDLE}” cd ${UCP BUNDLE} ../env.sh cd − BACKEND HOSTFILE=${NFS MOUNT}/logs/backend host i f[ −f $BACKEND HOSTFILE ] then

rm −f $BACKEND HOSTFILE >/dev/ n u l l 2>&1 f i docker stack deploy −c docker−compose.yml uasdev echo”Finished with the deployment of uas stack on‘date‘.”

Listing 5.7: Jenkins configuration script 2

5.2.5 Manual Installation

In order to perform the installation manually, use the ”start.sh: script in the UAS directory folder to set the environment variables for the ”docker-compose” file. Up- date the environment file used in the ”docker-compose” file. Listing 5.9 displays the script to run for deploying the application. echo”Started setting the evironment for uas application deployment ....” export DTR URL=dockerdtrdev.dot. state .oh.us export FRONTEND IMAGENAME=${DTR URL}/odotdev/uas−website export FRONTEND IMAGETAG=wlb export BACKEND IMAGENAME=${DTR URL}/odotdev/uas−backend export BACKEND IMAGETAG=wlb

55 5.2. SOFTWARE CONFIGURATIONS

export APP ENV=dev export UAS HOST=uasdev.dot. state .oh.us export PROJECT DIR=/data / uas

Listing 5.8: Code snippet of start.sh script docker stack deploy −c docker−compose.yml uas

Listing 5.9: Code snippet of start.sh script

56 Chapter 6

Results

6.1 Multi Server Architecture

The Common Operating Platform (COP) can now be deployed in a multi-server- based architecture system. In the below listing the Docker application stack can be observed by running the ”docker stack ls” command.

−sh −4.2$ docker stack ls | grep uas uasdev 2 Swarm

Listing 6.1: Code snippet to view Docker stacks

In the below listing the Docker services for both website and backend con- tainers can be observed by running the ”docker service ls” command.

−sh −4.2$ docker service ls | grep uas x8eqj80i8kw6 uasdev uas−backend replicated 1/1 dockerdtrdev.dot. state .oh.us/odotdev/uas−backend:wlb rqr7ywuo76b0 uasdev uas−website replicated 1/1 dockerdtrdev.dot. state .oh.us/odotdev/uas−website:wlb

Listing 6.2: Code snippet to view Docker services

In the below listing the containers for the website, backend and http-server

57 6.1. MULTI SERVER ARCHITECTURE containers can be observed.

−sh −4.2$ docker ps | grep uas 6fee9741088a connormanning/http−s e r v e r”http −s e r v e r/var/ww”7 hours ago Up 7 hours 10.239.96.10:50001 − >8080/tcp dotidkwrkd01 /uasdev.dot. state .oh.us.connormaning a4216189e0bf dockerdtrdev.dot.state.oh.us/odotdev/uas−backend:wlb ”/bin/bash/opt/back” 11 hours ago Up 11 hours

dotidkwrkd01/uasdev uas−backend .1. kdamyhaj8hwr3g8voqbjs81jm 9980592361ef dockerdtrdev.dot.state.oh.us/odotdev/uas−website:wlb ”/bin/bash/opt/star” 11 hours ago Up 11 hours

dotidkwrkd01/uasdev uas−website .1. qs2fijleqdw0myzdkbwpq7hkc

Listing 6.3: Code snippet to view Docker containers

Figures 6.1 and 6.2 display the application homepage and processed model on the new architecture for the COP.

Figure 6.1: Common Operating Platform (Proposed system)

58 6.2. UPLOAD LIMIT

Figure 6.2: Processed Model (Proposed System)

6.2 Upload Limit

The upload limit is now increased to 10GB. Multiple tests were conducted to perform uploads. Table 6.1 showcases the multiple test cases performed.

Table 6.1: Performance analysis of the uploads workflow

Sr. No. No. of Images Dataset size Status 1 33 220 MB Successful 2 67 455 MB Successful 3 115 917 MB Successful 4 550 4.44 GB Successful 5 643 5.25 GB Successful 6 772 6.3 GB Successful 7 902 7.4 GB Successful 8 1036 8.45 GB Successful 9 1132 9.24 GB Successful 10 1213 9.9 GB Successful

The results in the table portray the stability of the application to handle file uploads greater than 4GB. It can now handle file uploads up to 10GB.

59 6.3. DELETE AUTOMATION

6.3 Delete Automation

Delete automation is successfully achieved. Figure 6.3 and 6.4 are screenshots of the log files maintained for the workflows running in website and backend containers respectively. The script is currently scheduled to run every 15 minutes after which all the files marked for deletion on the website are deleted.

Figure 6.3: Snippet of delete automation workflow output on website container

Figure 6.4: Snippet of delete automation workflow output on backend container

6.4 Server Metric Logging

Logging for the server metrics is successfully maintained in the logs directory inside the file storage. In order to monitor Pix4d, these logs were compared with the one generated by Pix4D. Figures 6.5 and 6.6 shows the graphs for both the RAM and CPU. It is observed that they are very identical to each other with a slight offset

60 6.4. SERVER METRIC LOGGING in their values. This portrays that the values observed in Pix4D logs are similar to the server logs.

Figure 6.5: RAM comparison

Figure 6.6: CPU comparison

Below figures 6.7 is a sample output of the log file for server running website container. On an average, logs of up to 6 days are maintained on the servers. This is beneficial as any irregularities in the resource utilization can be captured.

Figure 6.7: Snipet of server metric logging workflow output

61 Chapter 7

Conclusion

In this thesis work, the old and new architecture of the common operating platform are discussed. All the components introduced in the new architecture are discussed in detail. The packaging of the backend processing module in a Docker container is implemented. The creation of docker-compose file for Docker Swarm container orchestration is explained. Integration of cloud based Azure File Share and Oracle Database is discussed. The upload limit is increased from 4GB to 10 GB and is benchmarked by conducting multiple tests. The delete automation workflow was successfully integrated to automatically delete files every 15 minutes. The accuracy of Pix4d logs and server metrics logging workflow generated system logs were com- pared. System requirements and steps to install the application with a CICD based and manual approach are discussed.

7.1 Future Scope

In this thesis, an already existing application named Common Operating Platform was scaled up from a single-server based system to a multi-server based system. This was achieved using advanced orchestration tools and various cloud computing

62 7.1. FUTURE SCOPE technologies. However, there is still room for improvement regarding certain aspects of the application. Below is a list consisting of tasks that can be considered for future enhancements to the application:

1. The processing unit of the application uses Pix4d for 3d modeling and is the most CPU and GPU intensive task in the entire processing workflow. Cur- rently, the application allows only a single user to process a project while the projects from other users would be maintained in a queue. Even though the application architecture has been improved from a single-server based archi- tecture to a multi-server based architecture, this limitation still persists. To resolve this limitation, we need to scale up the number of backend containers and use a message broker based software to manage these containers. The con- tainers can be scaled up using Docker Swarm[5] whereas RabbitMQ[40] can be used as the message broker based software. The message broker based soft- ware will communicate among the containers by maintaining a queue of tasks available and assigning them to the backend containers which are available to process.

2. The application does not allow the user to reprocess a project which has run into an error. Due to this, the user needs to create a new folder where the user needs to reupload the files with errors rectified. This is not user friendly and is a time-consuming process. However, this limitation can be resolved by adding a button that prompts the application to reprocess the project. The button on being clicked would invoke a script that updates the database and deletes the old copies of processed files to allow the application to create new ones.

3. The application doesn’t notify the user with information regarding errors ob- served while processing a project. This can be resolved by creating a link on the project display page for the user to download the standard log file generated for each project. This log file displays the information of all steps

63 7.1. FUTURE SCOPE

performed during processing and lists out errors if observed in any of the steps.

4. Other processing workflows like traffic monitoring can be integrated with the application.

64 References

[1] Margaret Rouse. What is distributed file system (DFS)? - Definition from

WhatIs.com. 2005. url: https://searchitoperations.techtarget.com/ definition / distributed - cloud https : / / searchoracle . techtarget . com/definition/distributed-database http://searchwindowsserver. techtarget.com/definition/distributed-file-system-DFS (visited on 10/28/2020).

[2] Niranjan R Krishnan et al. “A Web-Based Software Platform for Data Pro- cessing Workflows and its Applications in Aerial Data Analysis”. MA thesis. 2019.

[3] Docker. Docker overview — Docker Documentation. 2018. url: https : / / docs . docker . com / get - started / overview / https : / / docs . docker . com/get-started/overview/{\#}docker-objects{\%}0Ahttps://docs. docker.com/get-started/overview/{\%}0Ahttps://docs.docker.com/ engine/docker-overview/ (visited on 11/01/2020).

[4] PostgreSQL. PostgreSQL: About. 2016. url: https://www.postgresql.org/ about/ (visited on 10/31/2020).

[5] Docker Inc. Swarm mode key concepts - Docker Documentation. 2019. url: https : / / docs . docker . com / engine / swarm / key - concepts/ (visited on 10/27/2020).

[6] Docker Inc. Overview of Docker Compose — Docker Documentation. 2017.

url: https://docs.docker.com/compose/ https://docs.docker.com/

65 REFERENCES

compose/{\%}0Ahttps://docs.docker.com/compose/{\%}0Ahttps:// docs.docker.com/compose/overview/ (visited on 11/01/2020).

[7] Microsoft. Introduction to Azure Files — Microsoft Docs. 2018. url: https: //docs.microsoft.com/en-us/azure/storage/files/storage-files- introduction (visited on 10/29/2020).

[8] Oracle. What is Oracle Database. 2019. url: https://www.oracletutorial. com/getting-started/what-is-oracle-database/ (visited on 10/30/2020).

[9] Google INC. What are Containers and their benefits — Google Cloud. 2019.

url: https://cloud.google.com/containers/ (visited on 10/28/2020).

[10] Connor Manning. GitHub - potree/potree: WebGL point cloud viewer for large

datasets. url: https : / / . com / connormanning / potree https : / / github.com/potree/potree/ (visited on 11/01/2020).

[11] Jenkins Developers. Jenkins User Documentation. 2018. url: https://www. jenkins.io/doc/ (visited on 10/29/2020).

[12] Django Software Foundation. Django at a glance — Django documentation

— Django. url: https : / / docs . djangoproject . com / en / 3 . 1 / intro / overview/ https://docs.djangoproject.com/en/2.1/intro/overview/ (visited on 11/02/2020).

[13] Pix4d. Pix4Dengine: our engine and your custom solution — Pix4D. url: https://www.pix4d.com/product/pix4dengine (visited on 11/02/2020).

[14] OpenDroneMap. Open Source Toolkit for Processing Aerial Imagery - Open-

DroneMap. url: https://www.opendronemap.org/odm/ (visited on 11/02/2020).

[15] Docker. What is a Container? — App Containerization — Docker. 2020.

url: https://www.docker.com/resources/what- container (visited on 10/29/2020).

[16] AVI Networks. What is Container Orchestration? Definition & Related FAQs

— Avi Networks. 2020. url: https://avinetworks.com/glossary/container- orchestration/ (visited on 10/29/2020).

66 REFERENCES

[17] Docker Inc. Dockerfile reference — Docker Documentation. 2020. url: https: //docs.docker.com/engine/reference/builder/ (visited on 11/01/2020).

[18] IBM. Docker Swarm architecture. url: https://www.ibm.com/support/ knowledgecenter/en/SSD29G{\_}2.0.0/com.ibm.swg.ba.cognos.tm1{\_ }inst.2.0.0.doc/paw{\_}distributed{\_}architecture.html https: //www.ibm.com/support/knowledgecenter/SSD29G{\_}2.0.0/com.ibm. swg.ba.cognos.tm1{\_}inst.2.0.0.doc/paw{\_}distributed{\_}archit (visited on 11/02/2020).

[19] Docker. How services work — Docker Documentation. url: https://docs. docker.com/engine/swarm/how-swarm-mode-works/services/ (visited on 11/02/2020).

[20] Justin Slick. The Definition of 3D Modeling. 2020. url: https : / / www . lifewire.com/what-is-3d-modeling-2164 (visited on 10/29/2020).

[21] Pix4d. Pix4D: drone mapping & photogrammetry software. url: https:// www.pix4d.com/product/pix4dmapper-photogrammetry-software https: //www.pix4d.com/ (visited on 11/02/2020).

[22] Microsoft. Microsoft Azure - What is Azure. 2016. url: https://azure. microsoft.com/en-us/overview/what-is-azure/ (visited on 10/29/2020).

[23] Andrew Pfeiffer. What is a web server? - Learn web development — MDN.

2019. url: https : / / developer . mozilla . org / en - US / docs / Learn / Common{\_}questions/What{\_}is{\_}a{\_}web{\_}server.

[24] NGINX. What is NGINX? - NGINX. 2020. url: https://www.nginx.com/ resources/glossary/nginx/ (visited on 10/30/2020).

[25] Owen Garret. Inside {NGINX}: Designed for Performance & Scalability. 2015.

url: https://www.nginx.com/blog/inside-nginx-how-we-designed- for-performance-scale/ (visited on 10/30/2020).

[26] GoodFirms. What is a Web Framework? 2020. url: https://www.goodfirms. co/glossary/web-framework/ (visited on 10/30/2020).

67 REFERENCES

[27] Django Software Foundation. The Web framework for perfectionists with dead-

lines — Django. 2018. url: https://www.djangoproject.com/.

[28] EDUCBA. Django Architecture — Working System Of Django MVT Architec-

ture. 2020. url: https://www.educba.com/django-architecture/ (visited on 10/30/2020).

[29] Nigel George. Django’s Structure – A Heretic’s Eye View - Python Django.

2020. url: https://djangobook.com/mdj2-django-structure/ (visited on 10/30/2020).

[30] Simplilearn. What is Jenkins: Features and Architecture Explained. 2020. url: https://www.simplilearn.com/tutorials/jenkins-tutorial/what-is- jenkins (visited on 10/30/2020).

[31] Connor Manning. connormanning/entwine: Entwine - point cloud organization

for massive datasets. 2019. url: https : / / github . com / connormanning / entwine (visited on 10/31/2020).

[32] Chuck Gehman. What Is Microsoft TFS? — Perforce. 2018. url: https : //www.perforce.com/blog/vcs/what-team-foundation-server (visited on 10/31/2020).

[33] Saurabh. “What is Jenkins? — Jenkins For Continuous Integration — Edureka”.

In: edureka! (2019), pp. 1–10. url: https://www.edureka.co/blog/what- is-jenkins/.

[34] Supervisord. Supervisor: A Process Control System. 2015. url: http : / / supervisord.org/ (visited on 11/02/2020).

[35] Oracle. Welcome to cx Oracle’s documentation! — cx Oracle 8.1.0-dev docu-

mentation. 2020. url: https://cx-oracle.readthedocs.io/en/latest/ (visited on 11/02/2020).

[36] Wikipedia. “Network File System (NFS)”. In: 2002. doi: 10.1201/9781420000030. ch18. url: https : / / en . wikipedia . org / wiki / Network{\ _ }File{\ _ }System.

68 REFERENCES

[37] Docker. Docker for Azure persistent data volumes. 2019. url: http://docs. docker.oeynet.com/docker-for-azure/persistent-data-volumes/ https: //docs.docker.com/docker-for-azure/persistent-data-volumes/ (vis- ited on 11/02/2020).

[38] Oracle. Installing Oracle Database 12c on Windows. 2020. url: https:// www.oracle.com/webfolder/technetwork/tutorials/obe/db/12c/r1/ Windows{\_}DB{\_}Install{\_}OBE/Installing{\_}Oracle{\_}Db12c{\_ }Windows.html (visited on 11/01/2020).

[39] Niranjan Krishnan. Standard Operating Procedure for a Common Operating Platform for online 3D modeling. Tech. rep. 2019.

[40] RabbitMQ Team. Messaging that just works — RabbitMQ. 2009. url: https: //www.rabbitmq.com/ https://www.rabbitmq.com/{\#}features{\% }0Ahttps://www.rabbitmq.com/ (visited on 10/27/2020).

69