Optimizing Application and Microservice Performance
Total Page:16
File Type:pdf, Size:1020Kb
OPTIMIZING APPLICATION AND MICROSERVICE PERFORMANCE: Enhancing Oracle, MySQL, PostgreSQL, and Other Relational Databases with Replication and a Smart Cache By Irina Rimecode WHERE IS YOUR DATA GROWTH? T A B L E o f CONTENTS IN MICROSERVICES WE TRUST 1 OR DON'T WE? 2 PICK A PILL NEO 3 SPLENDORS AND MISERIES OF MODERN DBMSS 4 SOLVING PITFALLS WITH TARANTOOL 5 LET'S CHECK THE NUMBERS 6 TESTING MYSQL, POSTGRESSGL AND TARANTOOL 1 0 S O W H A T ? 1 3 Database management systems (DBMSes) like Oracle, PostgreSQL, MySQL, DB2 and Microsoft SQL Server are often responsible for key application and microservice performance pitfalls. Smart caches can solve some of these issues. Tarantool, which combines a smart cache, an application server, and a full disk and in-memory data grid, can effectively optimize existing relational architectures. IN MICROSERVICES WE TRUST Let's recall some of the reasons why we love microservices: The microservice architecture rejects the monolithic application concept. Instead of executing all limited contexts on a single server using interprocess communication, several small applications work in tandem, each corresponding to a limited context. As a rule, microservices run on different servers and interact over the network, so the microservice application can be viewed as a kind of distributed system. Microservices provide the comfort of not putting all of your “eggs of code” into one basket: services can be created and managed by various teams in many programming languages, and these teams have the flexibility to choose a variety of storage technologies. This simplifies both development and maintenance compared to a monolithic app, because even small changes in the latter require reassembly and deployment. Microservices also improve scalability compared to monolithic apps, because they don’t have to be entirely scaled each time a new module is needed. 1 ...OR DON"T WE? With all of these unquestionable benefits, though, we should consider the ways monolithic and microservice apps treat their databases. As regards the former, the application deals with a single database, which accesses functional components directly, and can read data belonging to other components. All components possess the same degree of data retention and integrity. In the microservice system, on the other hand, components work with their own databases (which are not directly accessible to other microservices). So the "my database, not your database" problem arises: that is, microservice data is available for reading and writing only through interfaces. This can cause the degree of data integrity to vary. Often while the system is running, the data of a single service provider (a “supplier”) is copied in full or in part by another microservice (a “client”), and if it changes, the supplier initiates an event to update the data copied by the client. The event drops into the message queue and waits for the client to receive and process it. At this point, the data from the supplier and the client is inconsistent, although eventually the changes will be applied to all copies. This is known as “eventual consistency” and it must be taken into account when microservice systems are developed. Decentralization in microservice systems dictates the order. 2 "PICK A PILL NEO" The common app pill. A conventional transaction-based approach guarantees consistency but leads to a significant temporal coupling and creates problems when dealing with multiple services. The microservice pill. Distributed transactions are very difficult to implement. The microservice architecture attempts to coordinate between services without transactions by explicitly assuming that consistency can only be eventual, and that any problems that arise will be solved by compensation operations. Another point to keep in mind is that synchronous calls between services increase downtime. In microservice systems, downtime is the product of the idle time of individual components interacting with each other. There are two options: either make calls asynchronous, or boost your zen and let the downtime be. But the main issue is yet to come. 3 SPLENDORS AND MISERIES OF MODERN DBMSS Let's consider some known headaches related to DBMSes before we switch to testing. Relational DBMSes lack the sweet properties of cache databases: namely high speed, low latency, and horizontal scaling. They are not accidentally called “disk” databases: they must interact with databases stored on disk and this significantly affects their access speed. In addition, the secondary keys, complex queries, and stored procedures of relational DBMSes poorly scale when using large clusters of computers. You can't simply add cluster nodes as needed. Easily scaled, horizontally relational DBMSes do exist, for example NonStop SQL or Teradata, but they are expensive, require advanced hardware, and are challenging to integrate into a diverse environment of data systems. Relational DBMSes are designed for a predefined data schema and ad hoc data requests. You can write arbitrary SELECT statements, arbitrary JOIN statements, etc., but — oops! — in microservice systems, it is usually the other way around. Their data schema transforms dynamically while the queries are more or less fixed — they change along with the data schema. A more cost effective solution than RDBMSes can be open, non-relational (NoSQL) databases. But they also have their drawbacks, specifically the lack of transactions. If a cache is used in combination with a traditional DBMS, for example MySQL plus Memcached or PostgreSQL plus Redis, then you can say goodbye to not only transactions, but also stored procedures and secondary indexes. Some cache properties are also lost, for example write capacity is reduced, and new problems arise, including inconsistency of data and cold starts. 4 SOLVING PITFALLS WITH TARANTOOL For the microservice system, all of the advantages of both cache databases and relational databases are needed simultaneously, and Tarantool claims to provide this solution. Tarantool runs in memory and stores two files on disk: a snapshot of data at a certain point in time, plus a log of all transactions. Its basic storage element is the tuple. A tuple can be made up of any number of dimensions, it's just an arbitrarily long list of fields each associated with a unique key. Each tuple belongs to a space and indexes can be defined on tuple fields. If you’d like to make analogies to a relational DBMS, "space" corresponds to table, and "fields" correspond to columns. One thing Tarantool clearly fixes is the “cold start” problem. Think about the way in which MySQL and PostgreSQL integrate with cache databases: in an application, Postgres and MySQL do not respond until everything is loaded into memory but their cache counterparts warm up much more slowly (1-2 Mbps), and therefore you need to use various hacks, like prewarming the index, to keep the two in sync (those who administer MySQL know this better than their pets’ names). As for Tarantool, it just gets installed and runs perfectly upon start. The cold start time is as short as possible. “Why Not Just Use Redis” — You Say? Tarantool’s primary idea is to quickly process large amounts of data in memory using something more than just the key/value and other data structures that Redis provides (Tarantool’s data model is closer to the MongoDB type, not just "data structures”). Just imagine that you have dozens or hundreds of gigabytes of live data: Tarantool gives you the opportunity to do something really complex and sophisticated with them. Tarantool also maintains transactions: you get normal begin, commit, and rollback in stored procedures, and it has secondary indexes that are updated automatically, consistently, and atomically. Finally, Tarantool simply uses less memory than Redis. 5 LET'S CHECK THE NUMBERS I am an empirical person that likes to test before I make choices. To run my tests, I used two entry-level VPSes, because low-power VPSes tend to reveal flaws better than more powerful ones (the weaker the machine, the clearer the differences will be between Tarantool and its alternatives). The first machine is the server and the second, the client. I tested two traditional databases, MySQL and PostgreSQL, as well as Tarantool. Local tests of CPU, RAM and the disk subsystem were performed using the utility sysbench. TEST VPS CONFIGURATION Local testing of sabbakka-1 and sabbakka-2 computing powers (CPU + RAM and disk subsystem performance) is performed using sysbench. 1. To test the CPU, we were calculating twenty thousand prime numbers. The calculation was performed in one thread by default. For parallel computations, you need to use the --num-threads=N switch. $ sysbench --test=cpu --cpu-max-prime=20000 run 6 CPU TEST RESULTS The most peculiar thing here is the total time because the machine with less RAM executed faster. (This test makes sense to run on servers that have different computing power; on same-characteristic machines the results will be more or less the same). The results obviously don't depend on the RAM amount but are probably attributable to the nuances of the provider's cloud infrastructure, because I used a virtual machine for the server and a container for the client. 7 2. Next I tested the disk subsystem of the server, sabbakka-1 (this test, related to I/O, only makes sense to run on the server). I executed three steps: (a) Generated a set of test files with the command: $ sysbench --test=fileio --file-total-size=5G prepare. As a result, I received files with a total capacity of 5GB (several times bigger than the RAM so that the operating system cache did not affect the results). (b) Carried out testing: $ sysbench --test=fileio --file-total-size=5G --file-test-mode=rndrw --init-rng=on -- max-time=300 --max-requests=0 run The test ran in random read/write mode (rndrw) for 300 seconds, and testing was performed in one thread (“Number of threads: 1”) by default.