ALGOHUB: A WEBSITE OF ALGORITHMS

A Project

Presented to the

Faculty of

California State Polytechnic University, Pomona

In Partial Fulfillment

Of the Requirements for the Degree

Master of Science

In

Computer Science

By

Liang Zhang

2017

SIGNATURE PAGE

PROJECT: ALGOHUB: A WEBSITE OF ALGORITHMS

AUTHOR: Liang Zhang

DATESUBMITTED: Fall 2017

Computer Science Department

Dr. Gilbert S. Young Project Committee Chair Computer Science

Dr. Hao Ji Computer Science

ii

ACKNOWLEDGEMENTS

First, I would like to thank my wife, Yatong Liang, for her supporting not only on my study, but also in all aspects of my life. Thanks all the professors I have met in

CalPoly Pomona, especially Dr. Young, and Dr. Ji, who taught me knowledge, tutored my research, and encouraged me overcoming the barricades. Last but not the least, thanks all the developers, who are contributing to the open source projects, for building so many wonderful software. Without your works, I couldn’t have finished my project.

iii

ABSTRACT

In this project, a website (AlgoHub) is implemented, which mainly hosts contents of algorithms, such as title, algorithm description, applications, and so on. From the functionality perspective, modules like adding, modifying, searching, fulfills the requirements of different groups of users (students, teachers, software engineers, and etc.) who are interested in algorithms. From the technology perspective, the system architecture of AlgoHub is designed to achieve high performance and scalability according to the nature of this application. During the implementation, rather than building a website from scratch, many practically proved technologies and existing components, such as Jekyll, Jenkins, Git, and so on, are utilized after being carefully evaluated.

iv

TABLE OF CONTENTS

SIGNATURE PAGE ...... ii

ACKNOWLEDGEMENTS ...... iii

ABSTRACT ...... iv

LIST OF FIGURES ...... vii

CHAPTER 1: INTRODUCTION ...... 1

CHAPTER 2: PRODUCT DESIGN ...... 3

2.1 Properties of an Algorithm ...... 3

2.2 Homepage ...... 4

2.3 View Algorithm Page ...... 4

2.4 Add Algorithm Page ...... 6

2.5 Edit Algorithm Page ...... 8

2.6 Search Algorithm Page ...... 8

CHAPTER 3: ENGINEERING DESIGN ...... 10

3.1 Manager ...... 13

3.2 Generator ...... 13

3.3 Searcher ...... 14

3.4 Infrastructure ...... 14

CHAPTER 4: TECHNOLOGIES ...... 15

4.1 Event-based Programming ...... 15

4.2 Container ...... 21

4.3 CI/CD ...... 25

CHAPTER 5: CONCLUSION AND FUTURE WORKS ...... 29

REFERENCES ...... 31

v

APPENDIX A: QUESTION IN CS530 ONLINE EXAM ...... 33

vi

LIST OF FIGURES

Figure 1:Homepage Mockup ...... 4

Figure 2: View Algorithm Mockup ...... 5

Figure 3: Add Algorithm - Other steps ...... 6

Figure 4: Add Algorithm - Step 1 ...... 6

Figure 5: editor UI (Simplemde) ...... 7

Figure 6: Embedded Search ...... 9

Figure 7: AlgoHub System Architecture ...... 12

Figure 8: Blocking Execution ...... 15

Figure 9: Non-blocking Execution ...... 16

Figure 10: A Use Case of Non-blocking ...... 16

Figure 11: Callback Hell ...... 17

Figure 12: Promise in ES6 ...... 18

Figure 13: Promise with Branch Statements ...... 19

Figure 14: Example of Async/Await ...... 20

Figure 15: Combine Await/Async with Promise ...... 20

Figure 16: Processes of a Container and the Application Inside ...... 21

Figure 17: User Container as a Command Line Tool ...... 22

Figure 18: Containers in Development Environment ...... 22

Figure 19: A Simple Dockerfile ...... 23

Figure 20: Command to Build an Image ...... 23

Figure 21: Command to Push an Image ...... 23

vii

Figure 22: Docker Swarm Load Balance ...... 24

Figure 23: Commands to Control a Service ...... 25

Figure 24: Jenkinsfile for Manage Module ...... 26

Figure 25: Screenshot of Jenkins Pipeline Report ...... 27

viii

CHAPTER 1: INTRODUCTION

In computer science, an algorithm is basically an instance of logic written in software by software developers to be effective for the intended “target” computer(s) to produce output from given (perhaps null) input [1]. People search online for an algorithm for interest more than in textbooks nowadays, thanks to the well-developed websites hosting rich content.

Audience: Most people searching an algorithm have different purposes.

Educators, such as professors in colleges, may be interested in some novel educating methods to effectively teach algorithms in their classes. Students may look for a course video to help refreshing their knowledge, find sample codes to help finishing their homework, or research on relevant algorithms to the ones learned in their classes to extend the scale of knowledge. Software engineers may look for algorithms to build their applications, or learn the practical implementation in a specific programming language.

Why AlgoHub? A lot of websites are providing contents of algorithms, which include personal blogs, search engines and knowledge databases (Wikipedia is one of the most popular). Despite the large number of websites out there, we can still hardly find one specifically designed to fulfill the requirements of all of our audiences.

In this project, a website (http://www.algohub.me) was implemented to host contents of algorithms. In the feature perspective, it was designed to satisfy the requirements of the audiences, who use the website for different purposes. In the technology perspective, some modern technologies were leveraged in this project, such as server-less, and event-based programming. Moreover, performance and scalability were also taken into account according to the nature of this application.

1

In the rest of the report, I will go through: product design, which explains the page layouts and interactions; engineering design, which covers the implementation details of the modules; technologies, which discusses the programming languages, components, and the architecture being used in the project; conclusion and future works, which list the works have been done and those can be improved in the future.

2

CHAPTER 2: PRODUCT DESIGN

In the feature perspective, the AlgoHub website consists of three modules:

• View algorithm: A page delivering the content of an algorithm.

• Search algorithm: A search box placed on pages and a popup layer showing

search results.

• Manage algorithm: A series of pages covering the functionalities of adding a new

algorithm, and modifying an existing algorithm.

All pages in these three modules share the same page layout to build a consistent

UI experience for users. The basic page layout includes three parts, which are header, content and footer.

2.1 Properties of an Algorithm

As mentioned in the introduction section, different audiences may have different expectations of the content and content medium of an algorithm. For instance, an educator may look for a video and a short description of a new way to teach an algorithm.

In this specific case, a section titled with “education” which medium formats of text and video should be considered in the design. By taking into account all the audiences, the properties of an algorithm are defined as following:

• Name of an algorithm

• Description briefly introduces what the algorithm is.

• Pseudo code demonstrates a high-level idea of an algorithm.

• Complexity shows the analysis of different types of complexity.

• Education lists ways of teaching

• Application lists real world applications leveraging the algorithm.

3

2.2 Homepage

Figure 1:Homepage Mockup Homepage (Fig.1) has only a search box in the middle.

• URL pattern: /

• UI components: Search box (details see section 2.6)

• Interactions: See section 2.6

2.3 View Algorithm Page

View Algorithm page (Fig.2) shows the contents of an algorithm by specifying the name in the URL.

4

Figure 2: View Algorithm Mockup

• URL pattern: /algo/{name}.html

{name} is the algorithm name in the url safe format, which can consist of number,

alphabet, and “-“, e.g. /algo/depth-first-search.html

• UI components:

Tag list contains all tags associated to the algorithm. Each tag is a link to the

search result.

Algorithm detail contains sections of algorithms properties, which is defined in

section 2.1.

• Interactions:

The link on tags triggers search behavior with tag as search criteria (see section

2.6). Edit link, which follows each section title, links to the corresponding step in

Edit Algorithm page (see section 2.5).

5

2.4 Add Algorithm Page

The users use this page to add a new algorithm. Rather than inputting the contents all at once, the Add Algorithm page splits the whole process into steps. Fig.3 and Fig.4 show the layout of different steps.

Figure 4: Add Algorithm - Step 1 Figure 3: Add Algorithm - Other Steps

• URL pattern: /content/add/{step}

{step} is the name of each steps, which value can be one of “name”, “desc”,

“pcode”, “comp”, “edu” and “app”.

• UI components:

Navigation side bar, which is on the right of the page, indicates the current step

and provides a way to directly jump to another step, if the user wants to skip the

steps between.

Action bar, which is on the bottom of the page, consists of “Previous”, “Next”,

“Preview”, and “Done” buttons.

6

Editor is a Markdown [2] editor, where the user can type in the contents in the

Markdown syntax. It also helps the user to check the work visually, before saving

the contents. Fig.5 shows a sample UI of the Markdown editor.

Figure 5: Markdown editor UI (Simplemde)

• Interactions:

Links in the navigation side bar link to other steps. When the user clicks a link, the

modifications of current step will not be lost.

Buttons in the action bar trigger different actions against current step. Clicking

“Previous” navigates to the previous step. If current step is the first step, it should

be disabled. Clicking “Next” navigates to the next step. If current step is the last

7

step, it should be disabled. Clicking “Preview” brings up a popup layer, which

shows the HTML result of the texts in the editor. Clicking “Done” saves the

contents of the algorithm, which include not only the content of current step, but

also those of other steps.

2.5 Edit Algorithm Page

The Edit Algorithm page shares the same UI and behavior with the Add

Algorithm page, excepts for filling pages with existing contents of the algorithm. Details see sction 2.4 (Add Algorithm Page).

2.6 Search Algorithm Page

The user can search algorithms in interest by tags and name. There is no dedicated page to show the search result. The search functionality is embedded into all the pages, except for the ones in manage module. The pages with search functionality provide a search box, in which the user can input the tag and keyword, and a popup layer showing the search result. Fig.6 shows how the search functionality looks like on the View

Algorithm page.

8

Figure 6: Embedded Search

• URL pattern: N/A

• UI components:

Search bar consists of a text input box and a search button. It has two variant UIs

designed for the HomePage and the other pages. On the HomePage, the search bar

is located in the middle of the page (Fig.1), with a larger text input box to fit the

page width. On the other pages, it is placed in the page header, as shown in Fig.6.

Search result layer is a popup layer, which lists links of View Algorithm pages.

• Interactions:

Clicking search button emits a search request to the search module, and brings up

the search result layer. The error message will be shown, in case the valid search

result does not return.

Clicking the links in the search result layer navigates to the corresponding View

Algorithm page in the same browser window of tab.

Clicking outside of the search result layer dismisses the popup layer.

9

CHAPTER 3: ENGINEERING DESIGN

AlgoHub is built towards the goals of lightweight, high-performance, and expandable. Though it is a simple website in the feature perspective, technical approaches still need to be carefully evaluated to achieve the goals.

Markdown: Markdown is an easy-to-read, easy-to-write [2] markup language for web writers. Markdown’s text formatting syntax helps the writers focus on contents that they are writing, while it simplifies the process of formatting the contents. For example, to generate a title of an articles, we write the Markdown code as:

# This is my title ## Subtitle

And the Markdown parser will translate it to HTML:

This is my title

Subtitle

Markdown also has an active developer community, which is another reason to adopt it in this project. Despite the easy-to-write nature of Markdown, there could be a lot of writers who aren’t willing to learn Markdown syntax, but still want to post something on AlgoHub. This makes a visual editor a must-have feature. Instead of building one from scratch, there are bunch of Markdown editor components available online, ready to plug into AlgoHub. There are also Markdown parser packages available for most of popular programing languages, which can be used in our preview feature.

Jekyll [3]: Technically AlgoHub is a CMS (Content Management System [4]) platform, where the users post, search, and view articles. To avoid reinventing the wheel, building AlgoHub on top of an existing CMS is an efficient option.

10

A traditional CMS, like WordPress [5], packs all the common features, such as add/edit/search, in a box. For those who are satisfied with its existing functionalities, it’s very easy to setup and run. But if people want to customize it, they have to figure out complex dependencies between modules, the meaning of every column in database and so on. Maintaining such a customized system also could be another painstaking task. Due to the fact that the system has already been modified (customized), applying an upgrade from original author will be an impossible mission. To make AlgoHub easy to expand,

Jekyll seems to be a good choice.

Jekyll is a non-database, Markdown based, and ruby powered HTML generator. It takes Markdown, HTML, and CSS files as input and generates a static website. Jekyll is so straightforward that it can be fitted in AlgoHub without any customization. Things to do to integrate Jekyll are mainly expanding functionalities around (not in) Jekyll, in

AlgoHub’s case they are searching, automated building and content inputting. Another advantage of Jekyll is high performance, because nothing can beat a pure static page in term of speed.

Git [6]: is a source control management [7] system, which is used as data storage in AlgoHub instead of a database for several reasons. Firstly, this is a practically proven solution. Github [8], one of the largest developer websites, is an example, which successfully combines Git and Jekyll to serve its document pages and users’ project pages. Secondly, Git fits well with other components in AlgoHub. Jekyll depends on plain text files (Markdown files, HTML layout files, and CSS files) to generate target pages. These plain text files are pretty much the data that needs to be stored to serve AlgoHub. Git is designed to store general files and folders, which meets the requirement of AlgoHub mentioned above.

11

Moreover, Git is good at versioning, which can support expanding AlgoHub with rollback and document version in a future version.

Put them all together: Fig.7 shows the overall system architecture of AlgoHub.

The design idea behind it mainly focuses on two objectives: flexibility and reliability.

Figure 7: AlgoHub System Architecture

There are three modules in this system: Manager, Generator and Searcher.

These modules are loosely coupled with each other to achieve high flexibility. The

Searcher reads a site map via HTTP to build its index and serves search service with a

HTTP interface. This is a common method most search engines use. Thus, Searcher can be easily replaced by other search services (such as Google Search) with just a minor code change to adapt to a new service. The Generator takes general files as input to generate HTML pages. It doesn’t matter where the input files are stored. With minor changes to the build script in Jenkins, we can easily switch the data storage from one to another.

12

Another advantage of this architecture is high reliability. These three modules are capable of being deployed on different servers so that they have little impact on each other from site operation’s view. For example, if the Manager server crashes by accident.

Jenkins [9] won’t work because we will lose the data storage, but HTML pages and

Searcher are still working fine. At the level of the whole website, we just lose the “write” functionality while the “read” part is still working. Moreover, the core functionality

(HTML pages which host algorithms’ contents) is rather robust due to the fact that hosting static pages is much simpler, thus fewer possibilities for running into problems than dynamic pages.

3.1 Manager

Manager module is implemented in NodeJS [9]. Express [10] framework is used to serve HTTP requests. The code is organized in MVC [11] model, where the view layer leverages EJS [12] as template engine to decouple the frontend code from the backend.

The workload includes:

• Implement Add/Edit pages.

• Work with Git to pull/push content changes.

• Package NodeJS code to Docker [13] image

3.2 Generator

There is not much coding work in Generator, the tasks here are mainly handled by

Jenkins and Jekyll. The workload includes:

• Implement page layout.

• Implement search behavior.

• Generate site map.

13

• Implement Jekyll plugin for Youtube video.

• Package HTML pages to Docker image

3.3 Searcher

Solr [14] is the search backend of Searcher. To meet the requirement of searching by tag and algorithms’ name, the Solr schema consists of these fields: URL, tags

(separated by comma, searchable) and name (searchable). The Searcher HTTP interface includes:

• Search by tag and name.

• Message listener to trigger rebuilding the index.

3.4 Infrastructure

The infrastructure provides running environments for modules, which includes:

Solr, Jenkins [15], , NodeJS and GitLab. All these components are running in

Docker containers. For the production environment, a Linux shell script is developed for each component to simplify the Docker operation. A Docker Composer configuration file is developed to minimize the efforts of setting up things in development environment.

14

CHAPTER 4: TECHNOLOGIES

4.1 Event-based Programming

NodeJS is an event-based programming language, which is used in most of the modules of this project, such as management and search. Blocking and Non-blocking calls are the foundation of this event-based programming language. Blocking is that

NodeJS process waits for an operation completing to continue executing the rest of a program. For example, in Fig.8, the function call “fun2” will not be executed until the function call “fun1” completes.

function func1() { console.log('func1'); }

function func2() { console.log('func2'); }

func1(); func2();

// console output: // func1 // func2 Figure 8: Blocking Execution

Non-blocking is that NodeJS process executes the rest of a program without waiting for the completion of the previous operation. In Fig.9, the execution of “func2” does not blocked by the completion of “func1”. The non-blocking behavior looks very similar to the multi-threading in other programming languages, e.g. Java, in which we can create threads and JVM will execute them simultaneously in the background. However, NodeJS is a single-threading implementation, which puts the execution of asynchronous operations, such as the those in “setTimeout”, in its event loop [16], and executes them in the event loop ticks.

15

function func1() { // async function setTimeout( () => { console.log('func1'); }, 0); }

function func2() { console.log('func2'); }

func1(); func2();

// console output: // func2 // func1

Figure 9: Non-blocking Execution

Taking advantage of the non-blocking execution, we can perform I/O operations, such as the interactions with disks and networks, simultaneously in NodeJS. A typical use case, shown in Fig.10, is performing multiple API calls to the backend.

function apiGetUserList(callback) { // async function // http call(getUserList) to backend, 2 seconds // callback(userList); }

function apiGetProductList(callback) { // async function // http call(getProductList) to backend, 1 second // callback(productList); }

function showResult(list) { console.log(list); }

apiGetUserList(showResult); apiGetProductList(showResult);

// console output: // productList (at 1st second) // userList (at 2nd second)

Figure 10: A Use Case of Non-blocking

The code snippet fires two API calls, getUserList and getProductList, at the same time, and the backend executes them in parallel. At the 1st second, the getProductList API call

16 returns the result, and at the 2nd second, the getUserList API call returns the result. In total, two API calls take 2 seconds to complete, which is faster than executing them in sequence (It would take 3 seconds).

In order to benefit from the non-blocking calls, we have to define callback functions to get notified as soon as the asynchronous operations are done. Fig.11 shows a callback stack to chain multiple asynchronous functions.

function asyncTask(msg, callback) { setTimeout( () => { console.log(msg), callback(); }, 0) }

function job1(callback) { asyncTask('job1 done', callback); }

function job2(callback) { asyncTask('job2 done', callback); }

function job3(callback) { asyncTask('job3 done', callback); }

// main job1( () => { job2( () => { job3( () => { console.log('all jobs done'); }); }); });

Figure 11: Callback Hell

There are three asynchronous functions, job1, job2, and job3 in this example, and they are chained with callbacks as shown in the main code block. The code does do the job of chaining all the jobs together, but it looks really ugly and hard to understand, and it

17 would be even harder to read if we try to stack more callback functions. The readability problem caused by nesting a callback function into another is called callback-hell, which is a big issue of leveraging NodeJS in a large code base.

To address the callback-hell issue in the language level, NodeJS introduced

Promise object in its ES6 version [17].

function asyncTask(msg) { return new Promise( (resolve) => { setTimeout( () => { console.log(msg), resolve(); }, 0) }); }

function job1() { return asyncTask('job1 done'); }

function job2() { return asyncTask('job2 done'); }

function job3() { return asyncTask('job3 done'); }

// main Promise.resolve() .then(job1) .then(job2) .then(job3) .then( () => { console.log('all jobs done'); });

Figure 12: Promise in ES6 The Promise object simplifies the code structure to chain asynchronous functions, thus making the code look more intuitive and easy to read. Fig.12 shows an example of chaining three jobs with the Promise. By involving the Promise object, we can read the code in a more semantic way – do job1, then do job2, then do job3.

18

However, the Promise solution is still far from perfection. If we involve a branch statement, shown in Fig.13, the code looks a bit semantically vague, even it is much better than the callback version.

function asyncTask(value) { return new Promise( (resolve) => { setTimeout( () => { console.log(value); resolve(value); }, 0) }); }

function job1() { return asyncTask(true); }

function job2() { return asyncTask(2); }

function job3(callback) { return asyncTask(3); }

// main Promise.resolve() .then(job1) .then( (jobReturn) => { if (jobReturn) { return job2(); } else { return job3(); } }) .then( () => { console.log('all jobs done'); });

Figure 13: Promise with Branch Statements Thanks to the async/await syntax introduced in ES8 [18], we are able to simplify the code further more. Fig.14 shows an example of how to use async/await to chain our asynchronous functions.

19

// skip the job definitions

async function scheduler() { const jobResult = await(job1()); if (jobResult) { await job2(); } else { await job3(); } console.log('all jobs done'); }

// main scheduler();

Figure 14: Example of Async/Await

In the semantic perspective, the await keyword means waiting for the result of an asynchronous function. With the async/await support, we can chain the asynchronous functions in the same way as synchronous functions, except for adding the await keyword in front of each asynchronous function calls. Moreover, combining the await with the

Promise, we can attain the power of parallelism from the Promise without losing the convenience from the await, as shown in Fig.15.

// skip the job definitions

async function scheduler() { const jobResult = await(job1()); if (jobResult) { await Promise.all([job2(), job3()]); } console.log('all jobs done'); }

// main scheduler();

Figure 15: Combine Await/Async with Promise

20

4.2 Container

The container is the key component of the Docker, which is widely used in this project. The software package running in a container is called container image, which includes everything to run an application: the application code, runtime, configurations, system libraries, and so on. The container isolates applications from their running environments. For example, to run a NodeJS application, we need to install the NodeJS runtime on the server. But, with container, we install the NodeJS runtime in a container image, and run the image in the Docker. From the server perspective, we only need to install the Docker software without worrying about the application dependencies, because they are packed into images.

The container looks very similar to the VM (Virtual Machine), but they are fundamentally different. From the feature perspective, they all isolate applications from their environments: in Docker, applications are running in containers; in VM, applications are running in guest OSs. We don’t need to install any application dependencies on the host OS in either case. However, the technologies and the design ideas behind them are totally different. Containers run as separate processes and share the operation system kernel, while VMs simulate hardware and run full copy of operation systems. Fig.16 shows the processes of a container and the application inside.

root 32285 7183 0 docker-containerd-shim e86f7afe882e5235e203669aeef root 32335 32285 0 npm root 32490 32335 0 sh -c node server.js root 32491 32490 0 node server.js

Figure 16: Processes of a Container and the Application Inside

The VMs are designed to serve as complete operation systems, in which we run all kinds of applications and tools. The containers are designed to be used as individual

21 applications in the similar way as we use command line tools, e.g. using container as a command line tool to show the date, as shown in Fig.17.

docker run --rm debian:jessie date >> Sat Nov 18 19:13:27 UTC 2017

Figure 17: User Container as a Command Line Tool

There are different ways to use Docker, and running a container as an application is the very basic usage. For example, I use the official container images of Nginx and

Solr in my development environment to simplify the setup process, scripts shown in

Fig.18.

docker run --name algohub.static \ -v ${PWD}/../static_new/build:/usr/share/nginx/html:ro \ -v ${PWD}/nginx/nginx.conf:/etc/nginx/nginx.conf \ -p 8301:80 -d nginx:1.12

docker run --name algohub.solr -d -p 8984:8983 \ -v ${PWD}/solr:/solr_home \ -e SOLR_HOME=/solr_home solr:6.4.1

Figure 18: Containers in Development Environment

In the production environment, using Docker swarm mode can easily setup and scale an application. It is a common sense to setup more than one instance of a web service to eliminate the single-point-of-failure problem, so in this project I use Docker swarm mode

[19] to deploy web services, such as searcher, manager, and static contents. Building customized images, and linking services, and controlling services are the major steps to deploy applications in the Docker swarm mode.

Build a customized image is the job that packs the code, configuration, and runtime into a container image, and there are several steps to follow. First, we need to create a file named Dockerfile to instruct the Docker what contents should be included in this image.

22

# Dockerfile FROM node:7.8-onbuild EXPOSE 3080

Figure 19: A Simple Dockerfile

Fig.19 shows a simple Dockerfile, which specifies the base image name with version

(node:7.8-onbuild), and exposes the HTTP port 3080 (it is the port that NodeJS server listens on). Actually, there are a lot of functions we can use in the Dockerfile to accomplish more complicated jobs, such as copying files to the image, executing a command inside the container, installing dependencies, and so on [20]. Second, we use the Docker command line tool to build the images.

docker build -t algohub_mgmt:latest .

Figure 20: Command to Build an Image

The command in Fig.20 tags the images as “algohub_mgmt” with version “latest”, and copies all files and directories in current directory to the image, which is a build-in functionality of the base image. After we execute the command, the container image will be stored in the local repository. Third, we use the Docker command line tool to push the image to a remote repository.

docker tag algohub_mgmt:latest /algohub_mgmt:latest docker push /algohub_mgmt:latest

Figure 21: Command to Push an Image The first command in Fig.21 tags the image with a remote repository name. The second commend uploads the image to the remote repository, so that the Docker on another host can access it via the network.

Linking services is the job that setup the network to enable the containers communicating with each other. In the traditional web service infrastructure, applications 23 can talk to each other with host names or domain names through a DNS server, given the fact that the hosts where applications run are defined by the administrators. However, in the swarm mode, Docker deploys applications throughout a cluster of hosts, and adjusts the distribution on the fly when we instruct Docker to scale up or down. This makes us hard to know which hosts the application is exactly running on, and even harder to modify the DNS records accordingly. For example, we have a Docker swarm cluster with two hosts named node1 and node2. We deploy the application with only one instance, which means the application could run on either node1 or node2. In order to talk to this application, we have to ask Docker where the application is now, which seems not to be an efficient way. Fortunately, Docker provides the way to access a certain service by its name, without the awareness of the service distribution details.

myapp 10.0.0.1

instance instance 10.0.0.2 10.0.0.3

Figure 22: Docker Swarm Load Balance

Fig.22 shows a service distribution example, in which the service is named myapp and having two instances. In case another service wants to communicate with myapp, it can just make a HTTP call to http://myapp/ (assuming myapp is a web service), and Docker’s build-in load-balancer will relay the request to either nodes. In this project, I adopt this approach to link the application load-balancer, which dispatches requests by different domain name, with all the web services, such as manager, searcher, site contents, and static resources.

24

Controlling services is the job that creates and scales services. Docker provides a straightforward interface, supported by the command “docker service”, to finish the job.

docker service create --replicas 2 --name algohub_mgmt /algohub-mgmt:latest

docker service update --image /algohub-mgmt:latest algohub_mgmt

docker service scale algohub_mgmt=4

Figure 23: Commands to Control a Service

In Fig.23, the fist command creates a service with 2 instances. The second the command updates the container images of the service to the latest version, which we can use to update the codes. The third command scales up the service to 4 instances, in case there are too many traffics to serve.

4.3 CI/CD

In software engineering, Continuous Integration (CI) is the practice of merging all developer working copies to a shared mainline several times a day [21], and Continuous

Delivery (CD) is an approach in which teams produce software in short cycles, ensuring that the software can be reliably release at any time [22]. The idea behind the professional terms of CI and CD is that we integrate, build, and package the code frequently to make sure the code is always ready to deploy.

In this project, I use Jenkins, which is one of the most popular CI/CD tools, as the build system. There are 4 modules need to integrate and deploy, and the configuration and build process of them are totally different. It would be a heavy workload to do the jobs manually, so it ends up an automated build system, Jenkins.

Pipeline in CI/CD is a sequence of expressions defining the whole process, from fetching source code from version control, to deploy the application package in the

25 production environment. In Jenkins, the Pipeline can be defined in a text file (called a

Jenkinsfile), and this manner is called “Pipeline-as-Code”. Fig.24 shows the Jenkinsfile of building the manage module.

node('NodeRaw') {

try { stage ('Clone Source') { checkout scm }

stage('Production Config') { def JEKYLL_CFG_ID = '3460009a-5013-467a-9b44-d29a922267e0' def JEKYLL_CFG_FILE = '_config.yml' configFileProvider([configFile(fileId: JEKYLL_CFG_ID, variable: 'CONFIG_YML')]) { sh "cp \"${CONFIG_YML}\" ${JEKYLL_CFG_FILE}" } }

stage('Build html') { JEKYLL_VERSION = '3.5' docker.image("jekyll/jekyll:${JEKYLL_VERSION}").inside { sh 'bundle install' sh 'bundle exec jekyll build' } }

stage('Build Docker image') { def newImage = docker.build("algohub-site") docker.withRegistry("https://239150759114.dkr.ecr.us-west- 1.amazonaws.com", "ecr:us-west-1:aws-ecr-cred") { newImage.push("${env.BUILD_ID}") newImage.push("latest") } }

} finally { stage('Cleanup') { cleanWs notFailBuild: true } }

}

Figure 24: Jenkinsfile for Manage Module

This Jenkinsfile defines the process of how to build a ready-to-deploy Docker image with multiple stages: fetching source code from version control; overwriting the production

26 configuration; generating HTML files by using Jekyll; packaging the HTML files into a

Docker image. After executing the operations defined in the Jenkinsfile, Jenkins will provide a visual report of the Pipeline result, as shown in Fig.25.

Figure 25: Screenshot of Jenkins Pipeline Report

One lesson I learned from setting up the Jenkins is that it is better to deploy the

Jenkins master and agents on different hosts, which is also a recommendation from the

Jenkins official document. At first, I set up the Jenkins (1 master and 2 executors on the same host) on an Amazon EC2 host with the type of t2.micro (1 CPU core, and 1 gigabytes memory). Right after the setup done, I found that Jenkins stopped responding, when a build was in progress. Executing any commands on the host gave out the “out of memory” error, so I thought of upgrading the host to a more powerful EC2 type. The second try was on an EC2 host with the type of t2.medium (2 CPU cores, and 4 gigabytes memory). With this configuration, the Jenkins was working fine. During a building process, the CPU load is between 1 to 2, and the memory consumption is between 2GB to

3GB. However, using a dedicated t2.medium host to serve the build system is relatively expensive, which costs around $40 per month, comparing to $30 for 3 t2.micro hosts which are serving all the modules (front-end and back-end). The third try was based on

27 this configuration: master (1 CPU core, and 1 gigabytes memory); 2 agents on separate hosts (each with 1 CPU core, and 1 gigabytes memory). Meanwhile, I set up the Jenkins on linode.com for a lower cost (around 50% lower than EC2). With this master-agent configuration, the Jenkins master performed much better by consuming around 50% of the hardware resources of the tiny host. The CPU load was 0.5, and the Jenkins process consumed around 43% (430 megabytes) of the total memory, when 2 jobs were running in parallel.

28

CHAPTER 5: CONCLUSION AND FUTURE WORKS

The AlgoHub project implemented a CMS-like website with a database-less solution, Jekyll. In this solution, all contents are stored in a Git repository, and built into

HTML files by the build system. Given the fact that the HTML web servers response much faster than the dynamic ones, AlgoHub attains a high performance from this solution. To support youtube video in the content, a customized plugin of Jekyll has been implemented, which is also a practice to extend Jekyll. A downside of this solution is that it could take tens of seconds for the web servers to catch the latest content changes, because the build system needs time to generate the HTML files. Most of the AlgoHub users see the old content before the updated one being sent to the web servers, therefor the delay will not be noticed (except the author). However, if we adopt the same solution to a time-sensitive application, further improvements will be required.

The project uses NodeJS as the main programming language, which is light- weight without compromising the performance. Parallel execution is hard to implement in many script languages, such as PHP, but the event-based programming model of

NodeJS makes asynchronous calls possible. For example, pushing changes to the Git repository and updating the search index are executed simultaneously, when the user submits a modified algorithm. Every coin has two sides, asynchronous calls lead to a software engineering problem, callback hell. By involving the async/await syntax introduced by ES8, AlgoHub maintains the parallelism without sacrificing the readability.

To further improve the project in the software engineering perspective, refactors could be conducted to adapt to other new features of ES8, such as class definition, short method name, and constructor function.

29

As a part of the experimental purpose, the project exercised a micro-service architecture, in which Docker played an important role. All modules of AlgoHub, which include manage, contents, searcher, and static resources, are packaged in container images and deployed with Docker swarm mode. The container technology greatly reduces the efforts of setting up the runtime environments, therefore letting the developers focus more on implementing the features. For example, we can use the pre- build NodeJS image to run the application without learning how to install the NodeJS runtime in Linux boxes. Moreover, the Docker swarm mode simplifies the process of distributing and scaling the applications. With the registry (container image repository), we can use just few commands to control a cluster of Docker hosts. Due to the need of persisting data, the backend of the searcher module, Solr, is still kept outside of the swarm. More researches and experiments of how to deploy database applications in a cloud environment are needed, so that we can include the Solr in the swarm cluster.

Jenkins automates process of building and deploying application codes, more importantly, generating the HTML files from user inputs. The Jenkins pipeline is leveraged to define the steps in the automation jobs, and helps to visualize the build result. A Jenkins cluster has also been set up by balancing the cost and the performance.

To test the features of AlgoHub, a demo algorithm has been added

(http://www.algohub.me/algo/depth-first-search.html), which includes standard

Markdown tags, and the customized Youtube tag. In the coming CS530 online exam, there will be a question to require the students to add algorithms of interest on AlgoHub, so that AlgoHub can be further tested. The question is shown in Appendix A.

30

REFERENCES

[1] WikiPedia, "Algorithm," [Online]. Available: https://en.wikipedia.org/wiki/Algorithm. [Accessed May 2017].

[2] D.F.C. LLC, "Introduction," [Online]. Available: http://daringfireball.net/projects/markdown. [Accessed May 2017].

[3] jekyllrb.com, "Introduction," [Online]. Available: https://jekyllrb.com. [Accessed May 2017].

[4] WikiPedia, "Content management system," [Online]. Available: https://en.wikipedia.org/wiki/Content_management_system. [Accessed May 2017].

[5] Automattic Inc., "Wordpress," Automattic Inc., [Online]. Available: https://wordpress.com/. [Accessed May 2017].

[6] L. Torvalds, "Git," [Online]. Available: https://git-scm.com/. [Accessed May 2017].

[7] WidiPedia, "Version control," [Online]. Available: https://en.wikipedia.org/wiki/Version_control. [Accessed May 2017].

[8] GitHub Inc., "GitHub," [Online]. Available: https://github.com/. [Accessed May 2017].

[9] Node.js Foundation, "NodeJS," [Online]. Available: https://nodejs.org/en/. [Accessed May 2017].

[10] Node.js Foundation, "Express," [Online]. Available: https://expressjs.com/. [Accessed May 2017].

[11] WikiPedia, "Model-view-controller," [Online]. Available: https://en.wikipedia.org/wiki/Model%E2%80%93view%E2%80%93controller. [Accessed May 2017].

[12] [email protected], "Embedded Javascript templates," [Online]. Available: https://github.com/mde/ejs. [Accessed May 2017].

[13] Docker Inc., "Docker," [Online]. Available: https://www.docker.com/. [Accessed May 2017].

[14] apache.org, "Apache Solr," Apache Software Foundation, [Online]. Available: http://lucene.apache.org/solr/. [Accessed Sep. 2017].

31

[15] Jenkins.io, "Jenkins," Jenkins.io, [Online]. Available: https://jenkins.io/. [Accessed May 2017].

[16] Node.js Foundation, "The Node.js Event Loop, Timers, and process.nextTick()," [Online]. Available: https://nodejs.org/en/docs/guides/event-loop-timers-and- nexttick/. [Accessed Oct. 2017].

[17] ecma international, "ECMAScript 2015 Language Specification," [Online]. Available: http://www.ecma-international.org/ecma-262/6.0/#sec-promise-objects. [Accessed Oct. 2017].

[18] Ecma international, "ECMAScript 2017 Language Specification," [Online]. Available: https://www.ecma-international.org/ecma-262/8.0/. [Accessed Oct. 2017].

[19] Docker Inc., "Swarm mode overview," [Online]. Available: https://docs.docker.com/engine/swarm/. [Accessed Oct. 2017].

[20] Docker Inc., "Dockerfile reference," [Online]. Available: https://docs.docker.com/engine/reference/builder/. [Accessed Oct. 2017].

[21] Wikipedia, "Continuous integration," [Online]. Available: https://en.wikipedia.org/wiki/Continuous_integration. [Accessed Oct. 2017].

[22] Widipedia, "Continuous delivery," [Online]. Available: https://en.wikipedia.org/wiki/Continuous_delivery. [Accessed Oct. 2017].

32

APPENDIX A: QUESTION IN CS530 ONLINE EXAM

Do some research of an algorithm you are interested in, and add the algorithm to http://www.algohub.me. Write down the link of your newly added algorithm as the answer.

The contents you input should include the following sections:

• Algorithm name (alphanumeric characters and spaces)

• Tags

• Description

• Pseudo Code

• Complexity

• Education

• Applications

Tips: You may look at the example (http://www.algohub.me/algo/depth-first-search.html) to get sense of what those sections mean.

33