UMEÅ UNIVERSITY June 3, 2021 Department of computing science Bachelor Thesis, 15hp

5DV199 Degree Project: Bachelor of Science in Computing Science

A scalability evaluation on CockroachDB

Tobias Lifhjelm

Teacher Jerry Eriksson Tutor Anna Jonsson A scalability evaluation on CockroachDB Contents

Contents

1 Introduction 1

2 Theory 3

2.1 Transaction ...... 3

2.2 Load Balancing ...... 4

2.3 ACID ...... 4

2.4 CockroachDB Architecture ...... 4

2.4.1 SQL Layer ...... 4

2.4.2 Replication Layer ...... 5

2.4.3 Transaction Layer ...... 5

2.4.3.1 Concurrency ...... 6

2.4.4 Distribution Layer ...... 7

2.4.5 Storage Layer ...... 7

3 Research Questions 7

4 Related Work 8

5 Method 9

5.1 Data Collection ...... 10

5.2 Delimitations ...... 11

6 Results 12

7 Discussion 15

7.1 Conclusion ...... 16

7.2 Limitations and Future Work ...... 16

8 Acknowledgment 17

June 3, 2021 Abstract

Databases are a cornerstone in data storage since they store and organize large amounts of data while allowing users to access specific parts of data eas- ily. must however adapt to an increasing amount of users without negatively affect the end-users. CochroachDB (CRDB) is a distributed SQL database that combines consistency related to relational database manage- ment systems, with scalability to handle more user requests simultaneously while still being consistent. This paper presents a study that evaluates the scalability properties of CRDB by measuring how latency is affected by the addition of more nodes to a CRDB cluster. The findings show that the la- tency can decrease with the addition of nodes to a cluster. However, there are cases when more nodes increase the latency. A scalability evaluation on CockroachDB 1 Introduction

1 Introduction

Databases have been a hot topic over the latest decades. Databases are useful for long-time storage and ordering data related to each other. Databases can store a large amount of data and organize it so that it is easy to search and fetch parts of data or information. However, databases are constantly under development, and there are always new types of databases entering the market to meet new demands. Furthermore, with the growth of the internet and more users connecting every year, databases have to adapt to this demand to maintain their usefulness [1].

In 1970 Edgar F. Codd [2] introduced a new model to organize data indepen- dently, reduce redundancy, and improve data storage consistency. With this new model, he set the foundations for the coming revolution for database sys- tems, namely, relational database management systems (RDBMS). RDBMS dominated the industry over the 1980s and ’90s, offering atomicity that en- sures that the database remains in a consistent state, efficient queries, and efficient disc space usage. RDBMS organize data into tables related to each other, thus enabling the retrieval of new tables consisting of data from one or more tables in a single query. This retrieval is a database operation known as table join. Users communicate with these database systems using the Structured Query Language (SQL). SQL allows for table joins with no need to know where tables reside on the disc. SQL syntax offers several opera- tions to read from and modify tables in a database. Some frequently used operations are:

• Select to requests information or data from the database.

• Update for modifying and updating existing data.

• Insert to add data to the database.

• Delete to remove data.

Traditional SQL databases were architectured to run on a single server. How- ever, with the internet workload growing larger than any single server can handle, a need to move from single database servers to distributed servers working in unity as one big logical database emerged. Scalability, the ability to add or remove resources to handle changing demands, became an impor- tant feature. A server that is a part of a bigger cluster is also called a node. The ability of a database to tolerate node failures is also an essential factor. Being tolerant to node failures means that a database is still intact even if some node (server) crashes. Tolerance to node failures is an important factor in ensuring that data is always available from the database. For this reason, NoSQL became an alternative, easy to scale, and tolerant of node failures. However, NoSQL comes with a trade-off in lack of table joins, and other SQL properties [3]. Then came a new type of database architecture that offers scalability without the drawbacks in NoSQL, called distributed SQL. This

1 June 3, 2021 A scalability evaluation on CockroachDB 1 Introduction new database can distribute data globally (geo-distribution) and at the same time provide familiar SQL syntax and properties [3]. CockroachDB (CRDB) [4] is one of these distributed SQL databases.

CRDB is a distributed SQL database developed by Cockroach Labs. It has grown in interest in recent years, especially when there is a need for consistency, which means that the database is always consistent and goes from one valid state to another, and scalability is highly valued. A CRDB database cluster consists of nodes. Each node is a separate logical sub- database, and all nodes in a cluster act together as one big logical database. CRDB scales easily by automatically increase capacity and migrating data when adding nodes [5]. CRDB describes as resilient to node crashes, highly scalable while still providing consistency through the whole cluster.

A CRDB node consists of several ranges depending on the size of data in the database [5]. Ranges are sub-parts of the data in the database which distribute across different nodes. To ensure that data is not lost if a node crashes or goes offline, these ranges replicate across nodes, and the replicated ranges are called replicas. Each node in a CRDB cluster work as a gateway node [5], which means that every node can establish SQL connections directly to a client. When a client starts a SQL connection to a node, it establishes a network connection that it can use to communicate with the database using the SQL syntax. Furthermore, it can request any data from the database and not just that from the node. A node connected to the client can either return the request directly to the client if in position. If not, the node will communicate with the other nodes to resolve the request. By every node being a gateway node means that CRDB works well with load balancers [6]. A load balancer distributes and establishes SQL connections equally across the nodes in the cluster.

Scalability in database management is divided into two categories, horizon- tally and vertical [7]. Horizontal scaling is the ability to partition data so that each node contains a part of the data and runs on a different server. Vertical scaling refers to adding better hardware to a single installation, such as more memory or a better CPU. The purpose of scaling is to handle chang- ing demands. These demands are often to handle greater throughput due to the increasing number of users or requests. Throughput can be the number of writes and reads per second or queries and transactions per second per- formed by the database. The difference between a query and a transaction is that a query is a single statement done in one section, whereas a transac- tion can consist of multiple queries. Another purpose for scaling is to reduce the latency taken for requests. In this study, latency refers to the time it takes for a user to send a request and get a response. This study focuses on horizontal scaling.

This study evaluates CRDB, focusing on latency and its correlation with multiple connections and how it is affected by the size of the cluster. This evaluation is done by setting up CRDB clusters on the cloud and implement- ing a test that measures latency. Doing this will give an insight into

2 June 3, 2021 A scalability evaluation on CockroachDB 2 Database Theory how CRDB adapts to the new demands that databases expose to.

The outline of this study is as follows:

1. Central concepts are defined, and we look at the architecture of CRDB nodes.

2. The research question answered in this study is presented along with its hypothesis.

3. A description of the related work for this study.

4. The method description with the choice of method, data collection, and the delimitations of this study.

5. Presentation of the results from the tests.

6. The discussion with the interpretation of results, limitations, conclu- sion and future work.

2 Database Theory and Architecture

In RDBMS, a database consists of one or more tables. A table is a collation of related data with a specified number of vertical columns identified by name. The values identify rows, also know as records, in one or more columns. To make each row unique in a table, one column can be a primary key. A primary key is a specific trait for a column that does not allow rows to have the same values for that column.

Workload refers to the type of job a database performs on. Workload may differ depending on what request the database gets. For example, one work- load can be that all request to the database consists of update operations, and another workload can be that all requests consist of select operations. A SQL connection establishes a network connection to the cluster, and then a client can make requests to the CRDB cluster using the SQL syntax [8].

2.1 Transaction

A transaction starts with a BEGIN statement, which means that the following statements are treated as one transaction. The transaction then continues with one or more statements, such as update or insert and etc. After all statements are complete, the transaction ends with a COMMIT statement [9]. The COMMIT statement stops the ongoing transaction and makes the changes permanent. If the transaction cannot commit, the database will roll back to the state before the transaction started.

3 June 3, 2021 A scalability evaluation on CockroachDB 2 Database Theory

2.2 Load Balancing

Load balancing refers to the act of distributing connections and incoming network traffic between different servers efficiently. A load balancer can efficiently distribute client requests or network load across multiple servers and only send requests to available servers.

2.3 ACID

As in traditional SQL databases, CRDB follows the ACID properties. ACID is the collective name for four characteristics: atomicity, consistency, isola- tion, and durability [7].

• Atomicity: A transaction that successfully completes should commit. If, for some reason, a transaction fails, it should be rolled back to the state before the transaction.

• Consistency: A database should always be consistent. This means that a transaction should always bring the database from a valid state to another. Check before and after each transaction if the data is consistent.

• Isolation: A transaction does not observe the transitional state of another running transaction.

• Durability: The database should never lose committed transactions.

2.4 CockroachDB Architecture

CRDB is a distributed SQL database built to scale, where each node in the cluster is used both to store data and for computations. Layers are different mechanisms in a node that make them work separately and together as a unit in a cluster. Furthermore, every node in the CRDB cluster consists of several layers, including SQL, transactional key-value (KV), distribution, replication, and storage [5]. This section describes the underlying structure of CRDB nodes, their functionality, and how the different layers depend on each other.

2.4.1 SQL Layer

The highest level of the layer for a node is the SQL layer [5]. It works as an interface between users and the database to convert SQL statements into read and write requests to the underlying key-value store. Since nodes in a CRDB cluster behaves symmetrically, each node can receive a request from a user, and the node that receives a request works as a gateway node between

4 June 3, 2021 A scalability evaluation on CockroachDB 2 Database Theory the cluster and user. This ability makes CRDB work well with load balancers that distribute connections between the different nodes in the cluster [6].

2.4.2 Replication Layer

The replication layer communicates with responses and request to the distri- bution layer and writes requests to the storage layer. Its primary purpose is to copy data between replicas and ensure consistency between the replicas. By default, CRDB replicates each range into three copies and stores them in three separate nodes to provide higher availability. Thus, the database can remain complete even when a node goes offline or crash. CRDB uses a consensus algorithm to ensure that a quorum of replicas agrees on changes done to ranges before committing the changes [5].

For consensus between ranges and replicas, CRDB uses the Raft consensus algorithm [5] where all replicas in a range form a Raft group. The Raft group consists of a leader that coordinates all writes in the group, and the rest are followers. The leader periodically sends heartbeats to the followers to let them know that it is still alive and to keep their logs replicated. When writes receive a quorum and committed by the leader, the write append to the Raft log. The Raft log makes for an ordered set of commands that the replicas have agreed on. The Raft log is treated as serializable and is used to bring nodes to a current state and update nodes that have previously gone offline [10].

For every Raft group, one replica (usually the Raft leader but not exclu- sively) acts as the leaseholder. The leaseholder is the only replica that can perform up-to-date reads and propose writes to the Raft leader. There- fore reads can bypass networking round trips required by the Raft protocol without abandon consistency [5]. When nodes add to the cluster, replicas automatically rebalance. The joining node sends information about itself to the existing nodes, and the cluster rebalances some replicas into the new node [10].

2.4.3 Transaction Layer

The third layer of a CRDB node is the transactional KV which receives KV operations from the SQL layer and controls the flow for KV operations to the distribution layer. The transactional layer provides consistency, ensures atomicity, and isolation guarantees in CRDB [11]. To include consistency through the whole cloud and to provide serializable isolation, CRDB uses a type of multi-version concurrency control (MVCC) [5] and treats all state- ments as a transaction. When a node creates an SQL connection with a user, it becomes the gateway node for its requests. The node receives and responds to the client and acts as a transaction coordinator.

5 June 3, 2021 A scalability evaluation on CockroachDB 2 Database Theory

The SQL layer sends KV operations to the transaction layer, but typically SQL requires a response from the current operation before initiating on the next operation [5]. To avoid stalling transactions while operations are repli- cating, the transaction coordinator uses two optimizations. The first opti- mization is the write pipelining [11]. In the write pipeline, the gateway node asks the leaseholder for the ranges it wants to write. When a leaseholder receives a request, it creates write intents, sends them to its followers, and responds to the gateway node which the write intents have been sent. Before the gateway node commits the transaction, it waits for the leaseholders to respond that all write intents are applied [11].

The other optimization is the parallel commits [11], and it improves response time for transactions. It enables the parallel write pipeline to replicate and commit operations. Usually, a transaction must wait for all writes to repli- cate before committing, meaning that in a replication factor of three, must wait for two sequential rounds of consensus before committing [5]. The par- allel commits protocol, however, introduces a staging transaction status. The write pipeline protocols enable a transaction to move on to the follow- ing statement before writing intents replicate from leaseholders. When the transaction coordinator receives a commit statement, it creates a transac- tion table, sets the records state to staging and records each key and write operation without knowing if the writes are complete. The transaction coor- dinator then waits for all pending writes to succeed and returns a message to the client that the transaction is committed. It then sets the transac- tion record’s state to committed and resolves the transaction’s write intents asynchronously [11].

2.4.3.1 Concurrency

CRDB is an MVCC system that maintains concurrency using commit times- tamps. When a transaction does a read or write operation, it uses commit timestamps. Furthermore, this means that all transactions in the system are ordered by timestamps and represented as a serializable execution [5]. How- ever, conflicts may arise between transactions that must be resolved, which is usually done by adjusting the commit timestamp.

In a write-read conflict, where a read meets an uncommitted intent with a lower timestamp, it waits for the earlier transaction to complete before continuing. A read that encounters an uncommitted intent with a higher timestamp will ignore that intent and continue. A write to a key with a higher timestamp cannot be performed in read-write conflicts if there has been a read with a higher timestamp at the same key. Then the write trans- action advances to a timestamp higher than the previous read timestamp. A write-write conflict is when a write comes upon an intent with a lower timestamp, and is solved by waiting for the earlier transaction to complete before continuing. If a write-write conflict comes upon an intent with a higher timestamp, it advances the timestamp past the earlier intent’s times-

6 June 3, 2021 A scalability evaluation on CockroachDB 3 Research Questions tamp [5]. Also, in some cases, a deadlock can occur in write-write conflicts. In these cases, CRDB has distributed a deadlock-detection algorithm [5] to remove one of the transactions.

2.4.4 Distribution Layer

In order to make all data in the database available from any node in the cluster, the database stores data as an abstraction of a monolithic logical keyspace ordered by key-value pairs [5]. The keyspace keeps track of all data in the cluster and where its location is. The data partitions into ranges, which are contiguous chunks of the keyspaces. Which makes related data close to each other, enabling efficient scans for data within a particular range and distributing data across the cluster [12].

The distribution layer is also responsible for routing each subset of the query according to the range. A range always starts as empty and grows as more data adds within the range and splits a range if data grows too large and merging ranges if they get too small. In addition, CRDB, by default, can split ranges based on the load to reduce imbalances in CPU usage between the nodes [5].

2.4.5 Storage Layer

The storage layer serves successful reads and writes by committing writes from the Raft log to disk and returning requested data to the replication layer. By default, CRDB uses the RocksDB engine as the underlying key- value storage for reads and writes to disk. RocksDB provides atomic write batches and makes the mapping key-value layer simpler [13].

3 Research Questions

The purpose of this study is to explore how the amount of connections im- pacts latency and if scaling up CRDB clusters can optimize the usage of multiple simultaneous connections by reducing latency.

CRDB attempts to combine the traditional RDBMS properties as consis- tency and other ACID properties with scalability. Consistency and scala- bility are properties that earlier have been considered hard to combine. By equally distributing the connections through the nodes, more nodes in a cluster should mean less work for each node. Moreover, less work for each node should reduce the latency for the user. However, a more extensive cluster and more nodes mean more ranges distributed over additional nodes, minimizing the possibility of records being in the same node. Furthermore, this means more communication between nodes which may increase latency.

7 June 3, 2021 A scalability evaluation on CockroachDB 4 Related Work

This because writes must be updated serializable through all replicas may cause more communication between nodes to get consensus which was ex- plained further in Section 2.4.3. Also, more nodes mean more transactions simultaneously and may lead to more writing and read conflicts that need to be solved, impacting latency.

An aspect of scaling is the possibility of a database efficiently handle a more significant number of simultaneously established SQL connections and reduce latency. Furthermore, a SQL connection will wait for a response from a request before sending new requests. Therefore a reduction in latency should improve the number of operations per second.

The research question is: Can the latency of a CRDB cluster that uses multiple connections be reduced by the addition of more nodes to the cluster, and is there a definite correlation between additional nodes to a cluster and reduced latency?

To answer the research question, we are using the cloud platform [14] to set up a distributed CRDB database with nodes distributed over compute engines. Compute engines are virtual machines with dedicated CPUs. Implement a test program that sets up multiple connections to the database. Then run multiple tests with different amounts of connection to the database with different workloads and measure the average latency.

The hypothesis is that larger amounts of established SQL connections send- ing requests should increase the overall latency. This because more connec- tions mean more transactions and more workload for the cluster. Also, a larger number of simultaneous transactions increase the risk for write and read conflicts that need to be solved, which will also increase latency. How- ever, by adding more nodes, the workload should be less for each node, resolve requests faster, and reduce latency. Also, since each node works as a gateway node, there is no single node that is a bottleneck, thus there should be a reduction in latency when more nodes are added to the cluster.

4 Related Work

Many studies evaluate and compare scalability in different systems. NoSQL is a topic widely researched in scalability. However, there are fewer studies that examine scalability in distributed SQL, such as CRDB.

An analysis of scalability in NoSQL databases compares MongoDB, Cas- sandra, and Hbase [7]. This study evaluates how these different databases perform in throughput (number of operations/second, more precisely write and reads) depending on the scale of the cluster and the given workload. The experiment in the study is performed on clusters using up to 13 servers. The parameters that are varied in the experiments are workload, record size, and nodes. The workloads used are the following: 50/50 read and write op-

8 June 3, 2021 A scalability evaluation on CockroachDB 5 Method erations, 100% read operations, 100% blind reads, 100% read-modify-write, and 100% scan. Record sizes are one and ten million. Cluster sizes range from two nodes to a maximum of 13 nodes. Results show that each NoSQL database has its own design feature that is advantageous for specific oper- ations types. Furthermore, data show that performance depends on record size and the number of nodes in the cluster. This paper does not include any CRDB scalability properties but is relevant for testing scalability in databases.

Another study analyzes CRDB and how its architecture is constructed [5]. It also explains the underlying structure of how CRDB does transactions while preserving ACID. The study also explains the mechanics behind how CRDB keeps consistency and how it scales. Furthermore, it evaluates some scalability properties in CRDB, and analyses CPU usage between nodes in a cluster. Analyses how CRDB throughput is affected under a TPC-C bench- mark when adding more nodes. TPC-C is an online transaction processing benchmark that tests the database under different transactions and analyses how many operations per second the database can handle. Moreover, the study also includes a comparison between CRDB and another distributed SQL database, . However, this study does not consider the correla- tion between the number of established connections and performance related to scaling.

Another relevant study testes dynamic timestamps allocation for reducing transactions abort [15] that gives an insight how different workloads may affect throughput and the number of aborts.

5 Method

This section explains the chosen method to answer the research question, the experiment setup, how data is collected, and the delimitations of the experiment.

The cluster is a setup of a database named Bank and a table named Accounts. Accounts consist of an Id, Data, and a Balance column. The first column is Id and it is the primary key. The second column is Data, which contains information about the account. The last column is Balance which holds the amount of money on that account. Each row in the database has a size of 1 kB and follows the structure shown in Table 1. The size of each record is chosen to this size because a more significant size would be unrealistic to be in one row. Then it would probably be divided into multiple tables. A size too small would inhibit the distribution between ranges in the database.

Table 1: Data representation of a row ID Data Balance 4 bytes 979 bytes 17 bytes

9 June 3, 2021 A scalability evaluation on CockroachDB 5 Method

Google compute engine (GCE) [16] is a service that offers clients to set up compute engines. GCEs are manageable using a Web console. These compute engines is a virtual machine with dedicated CPUs, memory and storage. GCEs are available in several CPU and RAM configurations and storage sizes and types, such as SSD or HDD. Because of the easy access and the versatile configurations GCEs are used to set up the cluster and load balancer. The load balancer HAproxy [17] distributes SQL connection equally between the nodes. HAproxy is chosen as the load balancer because it integrates well with CRDB [18]. CRDB version 20.2.9 is used on the tests because it was the most recent release when this study started.

The research question is answered by implementing and executing a test software. The Test software will open connections to a CRDB cluster through the load balancer. The test software uses the following workloads on the database:

• Workload A: 50% select transactions and 50% update transactions.

• Workload B: 100% select transactions.

• Workload C: 100% update transactions.

The test software tests the database with 200, 1000, and 2000 connections. The test software runs each workload separately on clusters consisting of three, five, and nine nodes for each amount of connections. The test software first runs all tests on a database with 100,000 records and then on a database with 1,000,000 records. These tests will show how latency correlates with the number of connections and nodes in the cluster. However, network conditions can alter depending on the time of the day. The test software executes during nighttime because most users are not active during the night, the assumption is that it has less impact on the test data.

The choice for using different workloads is because other related studies show that results may differ depending on workloads [7, 5]. Studies also show that performance may differ depending on the sizes of data in the database [7]. Therefore, the test uses two setups of a database with different record sizes.

5.1 Data Collection

The test software is written in Java using JDBC API for creating connections and sending requests to the database. In the test software, update requests are a part of a transaction, and each transaction consists of two operations. The transaction will first draw 100 units from a random accounts balance and then insert 100 units into another randomly chosen account. The transaction simulates how a bank would make a transaction. The select transaction consists of two select operations where each one selects a randomized ac- count for reading.

10 June 3, 2021 A scalability evaluation on CockroachDB 5 Method

The test software has a thread for each connection to the database that con- tinuously sends requests to the database for as long as the test software runs. The test software takes a timestamp before sending a request and one after it resolves. These timestamps are stored, and then test software calculates the average latency. For each combination of workload, the number of con- nections, and the number of nodes, the test runs for 10 minutes and executes three times, taking the average latency from the tests.

The testing software executes on the following specs:

CPU: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz Nr of processors: 4 Memory: 24 GB OS: Microsoft Windows 10 Pro version 10.0.18363

The settings for the nodes on the GCE cluster:

Machine type: n1-standard-2 (2 vCPUs, 7.5 GB memory) Boot disk type: SSD persistent disk 10 GB OS: Debian GNU/Linux 9 (stretch)

Settings for the load balancer:

Machine type: n1-standard-4 (4 vCPUs, 15 GB memory) Boot disk type: SSD persistent disk 10 GB OS: Debian GNU/Linux 9 (stretch)

5.2 Delimitations

The cloud is set up using GCPs on google server centers. Therefore the setup results will depend on the network latency between servers and where the workload is executing. The experiment uses CRDB version 20.2.9 and runs on insecure mode, which means there is no network encryption or authenti- cation.

The experiments are limited to update and select operations, and does not include operations like insert or delete. Due to the size of this scope definite generalizations are limited.

This study focuses mainly on the scalability performance in the distributed SQL database CRDB, how CRDB can reduce latency. Essential attributes for a distributed SQL database, such as fault tolerance, consistency, and concurrency, are not analyzed since this is not the scope of this research. This study also does not evaluate geo partitioning when clusters communicate between different regions because this is beyond the scope of this study.

11 June 3, 2021 A scalability evaluation on CockroachDB 6 Results

6 Results

This section presents the results from the experiment with the specs from Section 5. The following results have measured the average latency, in mil- liseconds (ms), for requests depending on the type of workload, the number of nodes, the amount of established connections, and the number of records in a CRDB cluster. Workload A combines select and update transactions, workload B only uses select transactions, and workload C only uses update transactions. Different connection pools consist of 200, 1,000, and 2,000 con- nections, cluster sizes with three, five, or nine nodes, and the different record sizes are 100,000 and 1,000,000.

Figure 1: The average latency in ms, shown in the y-axis, for a request depending on the number of nodes and workload on a database with 100,000 records and 200 established connections.

Figure 2: The average latency in ms, shown in the y-axis, for a request depending on the number of nodes and workload on a database with 100,000 records and 1,000 established connections.

12 June 3, 2021 A scalability evaluation on CockroachDB 6 Results

Figure 3: The average latency in ms, shown in the y-axis, for a request depending on the number of nodes and workload on a database with 100,000 records and 2,000 established connections.

The results in Figure 1 from a database with 100,000 records and 200 con- nections show a slight decrease in latency in workload C as the cluster grows bigger. A slight decrease in latency during workload A going from a three- node cluster to five nodes. At the same time, the latency remains roughly the same under workload B.

The test results with 1,000 SQL connections on database with 100,000 records are shown in Figure 2. These results show a lower average latency in the clusters with five nodes than the cluster with three nodes in workload A and C. However, these results also show an increase in latency when going from five nodes to nine nodes. Furthermore, workload B constantly shows a small decrease in latency as the cluster grows. The same tendency shows in Fig- ure 3, with the same setup as the previous test except with 2,000 established connections instead of 1,000. However, a bigger difference in latency increase when going from a five node cluster to a nine node cluster.

Figure 4: The average latency in ms, shown in the y-axis, for a request de- pending on the number of nodes and workload on a database with 1,000,000 records and 200 established connections.

13 June 3, 2021 A scalability evaluation on CockroachDB 6 Results

Figure 5: The average latency in ms, shown in the y-axis, for a request de- pending on the number of nodes and workload on a database with 1,000,000 records and 1,000 established connections.

Figure 6: The average latency in ms, shown in the y-axis, for a request de- pending on the number of nodes and workload on a database with 1,000,000 records and 2,000 established connections.

Results from tests performed on a database of 1,000,000 million records are shown in figure 4, 5, and 6. The results in Figure 4 with 200 established connections show a decrease in latency going from three nodes to five nodes in both workload A and workload C. However, the results also show an increase in latency, going from five nodes to nine. At the same time, the latency remained close to unchanged in workload B. Figure 5 and 6 show similar trends, in which latency decreases as more nodes are added to the cluster. The results also show an overall increase in latency as the number of established connections in the cluster increases.

14 June 3, 2021 A scalability evaluation on CockroachDB 7 Discussion

7 Discussion

The purpose of this study is to determine if latency can reduce by adding more nodes to a CRDB cluster that uses multiple connections and if there is a definite correlation between the addition of nodes and reduced latency. But also to explore how an increasing amount of connections impacts latency. In order to answer the research question, experiments were conducted that measured the average latency on CRDB clusters using different workloads and cluster sizes. The setup described in Section 5 and the acquired data from the test software show that CRDB can reduce latency for a cluster with multiple connections by adding more nodes. These results can be seen par- ticularly in the experiments done on CRDB clusters with 1,000,000 records, where additional nodes reduced latency. This tendency is also visible in the results of experiments with only read operations. However, results also show that there is no definite correlation between more nodes and reduced latency. These results are seen in the tests on databases with 100,000 records, where workloads consisting of write operations increased in latency when going from five node clusters to nine node clusters. The data also shows that more connections to the cluster have an increase in overall latency. This increase is not surprising since more connections mean additional requests, more work- load, and more transactions for the nodes to handle, possibly increasing overall latency.

As earlier mentioned, there are some anomalies. In the data from the tests, on a database with 100,000 records using 1,000 connections and 2,000 con- nections, the result shows an increase in latency going from five nodes to nine nodes for workload A and workload C. This result may be an outcome of write and read conflicts as explained in Section 2.4.3.1. Too few records with a combination of too many nodes and connections could inflict more write and read conflicts and increase the latency. Furthermore, more nodes mean more simultaneous transactions and possibly more conflicts, and because write-read and write-write conflict when meeting an uncommitted intent with lower timestamp have to wait for the earlier transaction to complete before continuing, possibly increasing latency. However, this assumption needs further research before drawing a conclusion.

One of the related works showed that results might vary depending on the size of data and the number of nodes in a database [7]. This study also shows a similar tendency. Results from the tests on a database with 1,000,000 records differ from the ones with 100,000 records. Where results clearly show a de- crease in latency with 1,000 and 2,000 connections for all workloads as the cluster gained more nodes in the database with more records. Having more records in a database will probably lower the chances of writing and reading conflicts and reduce waiting time for transactions. More nodes, however, means more communication between nodes since there is a lesser chance for a requested record being in the gateway node. Nevertheless, distributing the workload between the nodes seems to make up for the latency inflicted by extra communications between nodes. This phenomenon is probably due to

15 June 3, 2021 A scalability evaluation on CockroachDB 7 Discussion the optimization in write pipelining and parallel commits, as described un- der section 2.4.3, that improves response time for transactions. This pattern implies that a reduction in latency on the CRDB cluster with multiple con- nections depends on the size of the database when handling write operations or a combination of write and read operations.

Workload B consists of only select operations and shows the lowest latency in all tests from the data collected. Since select only performs read oper- ations, there are no changes made to the data. Furthermore, there are no read-write conflicts with no write operations, and this could be one factor that causes it to respond with such low latency and why the curve is more linear as nodes are added to the cluster than the other workloads. Another factor is of course, that workload B only consists of read operations. Read operations are faster because they do not have to write to disk.

7.1 Conclusion

This study shows that the overall latency increases as more connections are established to a CRDB cluster. Furthermore, it also reveals that latency can reduce in a CRDB cluster that uses multiple connections by adding more nodes to the cluster. However, it also shows that latency can increase by adding more nodes to a cluster in some situations. This tendency is visible in tests on smaller the database with workloads that includes write operations. There are implications that databases with larger records can better use additional nodes to reduce latency on workloads that include transactions with write operations.

Because of the ambiguity of the results and the scope of this study, it is hard to draw a firm conclusion about when more nodes to a cluster can decrease latency. As previously mentioned, there is an indication that databases with more records using multiple connections can make better use out of more nodes to decrease latency. However, this is just an indication, and the underlying reasons need further investigation.

7.2 Limitations and Future Work

The scope of this study is quite limited and only testes CRDB under partic- ular situations. The complexity and diversity of a database are much more complex than what the scope of this study could capture. Also, the complex- ity of the transactions in this study is primitive and does not consider how the database handles complex transactions. The setup used in the testing environment is fairly limited in using different hardware, especially for the ones in the cluster. More diverse but similar tests should be done in future work, including different workloads, other structures to the database, and different hardware types. Tests should also include complex transactions to see how they may affect performance to fully explore the complexity of

16 June 3, 2021 A scalability evaluation on CockroachDB 8 Acknowledgment this database. Furthermore, future work should include tests to analyze how conflicts, such as read-write conflicts, may affect latency, straightening out some of the anomalies from the test data.

8 Acknowledgment

A special thanks to Anna Jonsson for providing great feedback and guidance under this study. Also, a big thanks to Thomas-Emil Käck, Magnus Medin, and William Lundqvist for easing these harsh times under this corona pan- demic. Finally, a big thanks to my family for believing in me, and a most special thanks to my wife Sofia Lifhjelm for all support and help.

17 June 3, 2021 A scalability evaluation on CockroachDB References

References

[1] ITU. Statistics. https://www.itu.int/en/ITU-D/Statistics/Pages/ stat/default.aspx. Accessed: 2021-05-19.

[2] E. F. Codd. A relational model of data for large shared data banks. Commun. ACM, 13(6):377–387, June 1970.

[3] Cockroach Labs. A Brief History of Databases. https://www.cockroachlabs.com/blog/ history-of-databases-distributed-sql/. Accessed: 2021-05- 03.

[4] Cockroach Labs. Cockroach labs, the company building . https://www.cockroachlabs.com/. Accessed: 2021-04-23.

[5] Rebecca Taft, Irfan Sharif, Andrei Matei, Nathan VanBenschoten, Jor- dan Lewis, Tobias Grieger, Kai Niemi, Andy Woods, Anne Birzin, Raphael Poss, Paul Bardea, Amruta Ranade, Ben Darnell, Bram Gruneir, Justin Jaffray, Lucy Zhang, and Peter Mattis. Cockroachdb: The resilient geo- database. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data, SIG- MOD ’20, page 1493–1509, New York, NY, USA, 2020. Association for Computing Machinery.

[6] Cockroach Labs. SQL Layer. https://www.cockroachlabs.com/docs/ v20.2/architecture/sql-layer.html. Accessed: 2021-05-03.

[7] Surya Narayanan Swaminathan and Ramez Elmasri. Quantitative anal- ysis of scalable nosql databases. In 2016 IEEE International Congress on Big Data (BigData Congress), pages 323–326, 2016.

[8] Cockroach Labs. Client Connection Parameters. https: //www.cockroachlabs.com/docs/v20.2/connection-parameters. html#when-to-use-a-url-and-when-to-use-discrete-parameters. Accessed: 2021-05-19.

[9] Cockroach Labs. Transactions. https://www.cockroachlabs.com/ docs/v20.2/transactions.html. Accessed: 2021-05-03.

[10] Cockroach Labs. Replication Layer. https://www.cockroachlabs. com/docs/v20.2/architecture/replication-layer.html. Accessed: 2021-05-03.

[11] Cockroach Labs. Transaction Layer. https://www.cockroachlabs. com/docs/v20.2/architecture/transaction-layer.html. Accessed: 2021-05-03.

[12] Cockroach Labs. Distribution Layer. https://www.cockroachlabs. com/docs/v20.2/architecture/distribution-layer.html. Ac- cessed: 2021-05-03.

18 June 3, 2021 A scalability evaluation on CockroachDB References

[13] Cockroach Labs. Storage Layer. https://www.cockroachlabs.com/ docs/v20.2/architecture/storage-layer.html. Accessed: 2021-05- 03.

[14] Google LLC. Accelerate your transformation with Google Cloud. https://cloud.google.com/. Accessed: 2021-04-23.

[15] Vaibhav Arora, Ravi Kumar Suresh Babu, Sujaya Maiyya, Divyakant Agrawal, Amr El Abbadi, Xun Xue, Zhiyanan ., and Zhujianfeng . Dy- namic timestamp allocation for reducing transaction aborts. In 2018 IEEE 11th International Conference on Cloud Computing (CLOUD), pages 269–276, 2018.

[16] Google LLC. Compute Engine. https://cloud.google.com/compute. Accessed: 2021-06-01.

[17] HAProxy. The Reliable, High Performance TCP/HTTP Load Balancer. http://www.haproxy.org/. Accessed: 2021-05-13.

[18] Cockroach Labs. Step 5. Set up load balanc- ing. https://www.cockroachlabs.com/docs/v20.2/ deploy-cockroachdb-on-premises-insecure. Accessed: 2021-05-17.

19 June 3, 2021