An End-to-End Automatic Cloud Database Tuning System Using Deep Reinforcement Learning
Ji Zhang§, Yu Liu§, Ke Zhou§ , Guoliang Li‡, Zhili Xiao†, Bin Cheng†, Jiashu Xing†, Yangtao Wang§, Tianheng Cheng§, Li Liu§, Minwei Ran§, and Zekang Li§ §Wuhan National Laboratory for Optoelectronics, Huazhong University of Science and Technology, China ‡Tsinghua University, China, †Tencent Inc., China {jizhang, liu_yu, k.zhou, ytwbruce, vic, lillian_hust, mwran, zekangli}@hust.edu.cn [email protected]; {tomxiao, bencheng, flacroxing}@tencent.com
ABSTRACT our model and improves efficiency of online tuning. We con- Configuration tuning is vital to optimize the performance ducted extensive experiments under 6 different workloads of database management system (DBMS). It becomes more on real cloud databases to demonstrate the superiority of tedious and urgent for cloud databases (CDB) due to the di- CDBTune. Experimental results showed that CDBTune had a verse database instances and query workloads, which make good adaptability and significantly outperformed the state- the database administrator (DBA) incompetent. Although of-the-art tuning tools and DBA experts. there are some studies on automatic DBMS configuration tuning, they have several limitations. Firstly, they adopt a 1 INTRODUCTION pipelined learning model but cannot optimize the overall The performance of database management systems (DBMSs) performance in an end-to-end manner. Secondly, they rely relies on hundreds of tunable configuration knobs. Supe- on large-scale high-quality training samples which are hard rior knob settings can improve the performance for DBMSs to obtain. Thirdly, there are a large number of knobs that (e.g., higher throughput and lower latency). However, only a are in continuous space and have unseen dependencies, and few experienced database administrators (DBAs) master the they cannot recommend reasonable configurations in such skills of setting appropriate knob configurations. In cloud high-dimensional continuous space. Lastly, in cloud environ- databases (CDB), however, even the most experienced DBAs ment, they can hardly cope with the changes of hardware cannot solve most of the tuning problems. Consequently, configurations and workloads, and have poor adaptability. cloud database service providers are facing a challenge that To address these challenges, we design an end-to-end au- they have to tune cloud database systems for a large num- tomatic CDB tuning system, CDBTune, using deep reinforce- ber of users with limited and expensive DBA experts. As ment learning (RL). CDBTune utilizes the deep deterministic a result, developing effective systems to accomplish auto- policy gradient method to find the optimal configurations matic parameters configuration and optimization becomes in high-dimensional continuous space. CDBTune adopts a an indispensable way to overcome this challenge. try-and-error strategy to learn knob settings with a limited There are two classes of representative studies in DBMS number of samples to accomplish the initial training, which configuration tuning: search-based methods55 [ ] and learning- alleviates the difficulty of collecting massive high-quality based methods [4, 14, 35]. The search-based methods, e.g., samples. CDBTune adopts the reward-feedback mechanism BestConfig [55], search the optimal parameters based on in RL instead of traditional regression, which enables end- certain given principles. However, they have two limitations. to-end learning and accelerates the convergence speed of Firstly, they spend a great amount of time on searching the optimal configurations. Secondly, they restart the search pro- cessing whenever a new tuning request comes, and thus fail Permission to make digital or hard copies of all or part of this work for to utilize knowledge gained from previous tuning efforts. personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear The learning-based methods, e.g., OtterTune [4], utilize this notice and the full citation on the first page. Copyrights for components machine-learning techniques to collect, process and analyze of this work owned by others than ACM must be honored. Abstracting with knobs and recommend possible settings by learning DBA’s credit is permitted. To copy otherwise, or republish, to post on servers or to experiences from historical data. However, they have four redistribute to lists, requires prior specific permission and/or a fee. Request limitations. Firstly, they adopt a pipelined learning model, permissions from [email protected]. which suffers from a severe problem that the optimal solution SIGMOD ’19, June 30-July 5, 2019, Amsterdam, Netherlands © 2019 Association for Computing Machinery. of the previous stage cannot guarantee the optimal solution ACM ISBN 978-1-4503-5643-5/19/06. in the latter stage and different stages of the model may https://doi.org/10.1145/3299869.3300085 not work well with each other. Thus they cannot optimize 1800 2000 550
1600 1750 MySQL Default 500 MySQL Default DBA 1400 DBA 1500 450 OtterTune with deep learning 1200 OtterTune with deep learning 1250 400 OtterTune OtterTune 1000 1000 350 800 750 300 600 500 250 Number of Knobs of Number
Throughput (txn/sec) Throughput 400 (txn/sec) Throughput 250 200
2 4 6 8 10 12 2 4 6 8 10 12 14 1.0 2.0 3.0 4.0 5.0 6.0 7.0 Number of Samples (x1000) Number of Samples (x1000) CDB Version
(a) CDB (TPC-H) (b) CDB (Sysbench) (c) Knobs Increase (d) Performance surface Figure 1: (a) and (b) show the performance of OtterTune [4] and OtterTune with deep learning over number of samples com- pared with default settings (MySQL v5.6) and configurations generated by experienced DBAs on1 CDB (developed by company Tencent). (c) shows the number of tunable knobs provided by CDB in different versions. (d) shows the performance surface of CDB (Read-Write workload of Sysbench, physical memory = 8GB, disk = 100GB). the overall performance in an end-to-end manner. Secondly, alleviates the burden of collecting too many samples in ini- they rely on large-scale high-quality training samples, which tial stage of modeling and is more in line with the DBA’s are hard to obtain. For example, the performance of cloud judgements and tuning action in real scenarios. CDBTune uti- databases is affected by various factors such as memory size, lizes deep deterministic policy gradient method to find the disk capacity, workload, CPU model and database type. It optimal configurations in continuous space, which solves is hard to reproduce all conditions and accumulate high- the problem of quantization loss caused by regression in quality samples. As shown in Figures 1(a) and 1(b), without existing methods. We conducted extensive experiments un- high-quality samples, OtterTune [4] or OtterTune with deep der 6 different workloads on four types of databases. Our learning (we reproduce OtterTune and improve its pipelined experimental results demonstrated that CDBTune can recom- model using deep learning) can hardly gain higher perfor- mend knob settings that greatly improve performance with mance even though provided with an increasing number higher throughput and lower latency compared with existing of samples. Thirdly, in practice there are a large number of tuning tools and DBA experts. Besides, CDBTune has a good knobs as shown in Figure 1(c). They cannot optimize the adaptability so that the performance of CDB deployed on knob settings in high-dimensional continuous space by just configurations recommended by CDBTune will not decline using regression method like the Gaussian Process (GP) re- even though the environment (e.g., memory, disk, workloads) gression OtterTune used, because the DBMS configuration changes. Note that some other ML solutions can be explored tuning problem that aims to find the optimal solution in to improve the database tuning performance further. continuous space is NP-hard [4]. Moreover, the knobs are in In this paper, we make the following contributions: continuous space and have unseen dependencies. As shown (1) To the best of our knowledge, this is the first end-to-end in Figure 1(d), due to nonlinear correlations and dependen- automatic database tuning system that uses deep RL to learn cies between knobs, the performance will not monotonically and recommend configurations for databases. change in any direction. Besides, there exist countless combi- (2) We adopt a try-and-error manner in RL to learn the best nations of knobs because of the continuous tunable parame- knob settings with a limited number of samples. ter space, making it tricky to find the optimal solution. Lastly, (3) We design an effective reward function in RL, which in cloud environment, due to the flexibility of cloud, users enables an end-to-end tuning system, accelerates the conver- often change the hardware configuration, such as adjusting gence speed of our model, and improves tuning efficiency. the memory size and disk capacity. According to statistics (4) CDBTune utilizes the deep deterministic policy gradient from Tencent, 1,800 users have made 6,700 adjustments in method to find the optimal configurations in high-dimensional half a year. In this case, conventional machine learning have continuous space. poor adaptability which needs to retrain the model to adapt (5) Experimental results demonstrate that CDBTune with a to the new environment. good adaptability could recommend knob settings that greatly In this paper, we design an end-to-end automatic cloud improved performance and compared with the state-of-the- database tuning system CDBTune using deep reinforcement art tuning tools and DBA experts. Our system is open-sourced learning (RL). CDBTune uses the reward functions in RL to and publicly available on Github2. provide a feedback for evaluating the performance of cloud database, and propose an end-to-end learning model based 2 SYSTEM OVERVIEW on the feedback mechanism. The end-to-end design improves In this section, we present our end-to-end automatic cloud CDBTune the efficiency and maintainability of the system. CDBTune database tuning system using deep RL. We firstly CDBTune adopts a try-and-error method to enable utilizing a few sam- introduce the working mechanism of (Section 2.1) CDBTune ples to tune knobs for achieving higher performance, which and then present the architecture of (Section 2.2). 1https://intl.cloud.tencent.com 2https://github.com/HustAIsGroup/CDBTune 2 )ROKTZ )RU[J 2.1 CDBTune Working Mechanism :[TOTM9_YZKS =UXQRUGJ CDBTune firstly trains a model based on some training data. =UXQRUGJ 3KZXOIY -KTKXGZUX )URRKIZUX :XGOTOTM Then given an online tuning request, CDBTune utilizes the 8KW[KYZ /TZKXTGR3KZXOIY
:[TOTM /TZKXLGIK model to recommend knob settings. CDBTune also updates *KKV8KOTLUXIKSKTZ2KGXTOTM 8KW[KYZ the model by taking the tuning request as training data. )UTZXURRKX 'MKTZ 9ZGZK 8K]GXJ 'IZOUT +T\OXUTSKTZ 1TUH 9KZZOTMY 2.1.1 Offline Training. We first briefly introduce the basic )UTLOM[XGZOUTY 3KZXOIY idea of the training model (and more details will be discussed )*( 8KIUSSKTJKX 3KSUX_6UUR in Sections 3 and 4) and then present how to collect the training data. Figure 2: System Architecture. Training Data. The training data is a set of training quadru- ples ⟨q, a,s,r⟩, where q is a set of query workloads (i.e., SQL will further strengthen the model and improve the recom- queries), a is a set of knobs as well as their values when mendation accuracy of the model. processing q, s is the database state (which is a set of 63 2.1.2 Online Tuning. If a user wants to tune her database, metrics) when processing q, r is the performance when pro- she just needs to submit a tuning request to CDBTune which cessing q (including throughput and latency). We use the is in line with existing tuning tools like OtterTune and Best- SQL command “show status” to get the state s, which is a Config. Once receiving an online tuning request from a user, common command that DBA uses to understand the state of CDBTune collects the query workload q from the user in re- database. The state metrics keep the statistic information of cent about 150 seconds, gets the current knob configuration the CDB, which describe the current state of database and a, and executes the query workload in CDB to generate the are called internal metrics. There are 63 internal metrics in current state s and performance r. Next it uses the model CDB, including 14 state values and 49 cumulative values. obtained by offline training to do online tuning. Eventually, Example state metrics include buffer size, page size, etc., and those knobs corresponding to the best performance in on- cumulative values include data reads, lock timeouts, buffer line tuning will be recommended to the user. If the tuning pool in pages, buffer pool read/write requests, lock time, etc. process terminates, we also need to update deep RL model All the collected metrics and knobs data will be stored in the and the memory pool. The reason why we update the mem- memory pool (see Section 2.2.4). ory pool is that the samples produced by RL are generally sequential (such as configuration tuning step by step), which Training Model. Because the DBMS configuration tuning does not conform to the i.i.d. hypothesis between samples in problem that aims to find the optimal solution in continuous deep learning. Based on this, we will randomly extract some space is NP-hard [4], we use deep RL as the training model. batches of samples each time and update the model in order RL adopts a try-and-error strategy to train the model, which to eliminate the correlations between samples. Note that the explores more optimal configurations that the DBA never user’s workload of the database in online tuning process is tried, reducing the possibility of falling in local optimum. different from the standard workload stored in the database. Note the RL model is trained once offline which will be used Hence, CDBTune needs to fine-tune the pre-training model to tune the database knobs for each tuning request from in order to adapt to the real workload. There are mainly database users. The details of the training model will be two differences between online tuning and offline training. introduced in Section 3. On one hand, we no longer utilize the simulated data. In- Training Data Generation. There are two ways to collect stead, we replay the user’s current workload (see Section the training data. (1) Cold Start. Because of lacking histori- 2.2.1) to conduct stress testing on CDB in order to fine-tune cal experience data at the beginning of the offline training the model. On the other hand, the tuning terminates if the process, we utilize standard workload testing tools (such as user obtains a satisfied performance with the improvements Sysbench) to generate a set of query workloads. Then for over the initial configuration or the number of tuning steps each query workload q, we execute it on CDB and get the ini- reaches the predefined maximum. In our experiment, we set tial quadruple. Then we use the above try-and-error strategy the maximum number to 5. to train the quadruple and get more training data. (2) Incre- mental Training. During the later practical use of CDBTune, 2.2 System Architecture for each user tuning request, our system continuously gains Figure 2 illustrates the architecture of CDBTune. The dotted feedback information from the user request according to con- box on the left represents the client, where users send their figurations CDBTune recommends. With adding more real own requests to the server through the local interface. The user behavior data to the training process gradually, CDBTune other dotted box represents our tuning system in cloud, in 3 which the controller under distributed cloud platform inter- We also try other methods. For example, we replace the av- acts information among the client, CDB and CDBTune. When erage value by taking the peak and trough values of metrics the user initiates a tuning request or the DBA initiates a in a period of time, which just grasp the local state of data- training request via the controller, the workload generator base. Experimental results show that the peak and trough conducts stress testing on CDB’s instances which remain to values are not as good as the average value due to lack of be tuned by simulating workloads or replaying the user’s accurate grasp of the database state. Last but not least, as we workloads. At the same time, the metrics collector collects use the “show status” command to get the database states, and processes related metrics. The processed data will be it will not affect the deployment under different workloads, stored in the memory pool and fed into the deep RL network environments and settings. respectively. Finally, the recommender outputs the knob con- 2.2.3 Recommender. When the deep RL model outputs figurations which will be deployed on CDB. the recommended configurations, the recommender gener- 2.2.1 Workload Generator. From the description above, ates corresponding execution setting parameter commands the workload generator we designed mainly completes two and sends the request of modifying configurations to the tasks: generating the standard workload testing and replay- controller. After acquiring the DBA’s or user’s license, the ing the current user’s real workload. Due to lacking samples controller deploys above configurations on CDB’s instances. (historical experience data) in initial training, we can uti- 2.2.4 Memory Pool. As mentioned above, we use the 3 lize standard workload testing tools such as Sysbench /TPC- memory pool to store the training samples. Generally speak- 4 MySQL combined with the try-and-error manner of RL to ing, each experience sample contains four categories of infor- collect simulated data, avoiding highly relying on real data. mation: the state of current database st (vectorized internal In this way, a standard (pre-training) model is established. metrics), the reward value rt calculated by reward function When having accumulated a certain amount of feedback data (which will be introduced in Section 4.2) via external metrics, from the user and recommending configurations to the user, knobs of the database to be executed at , and the database’s we use the replay mechanism of the workload generator to state vector after executing the configurations st+1. A sample collect the user’s SQL records in a period of time and then ex- can be written as (st , rt , at , st+1), which is called a transition. ecute them under the same environment so as to restore the Like the DBA’s brain, it constantly accumulates data and user’s real behavior data. By this means, the model can grasp replay experience, so we call it experience replay memory. the real state of the user’s database instances more accurately and further recommend more superior configurations. 3 RL IN CDBTUNE 2.2.2 Metrics Collector. When tuning CDB upon a tuning To simulate the try-and-error method that the DBA adopts request, we will collect and process the metrics data which and overcome the shortcoming caused by regression, we can capture more aspects of CDB runtime behavior in a cer- introduce RL which originates from the method of try-and- tain time interval. Since the 63 metrics represent the current error in animal learning psychology and is a key technology state of database and are fed to the deep RL model in the to solve NP-hard problems of database tuning in continuous form of vectors, we need to provide simple instructions when space. All the notations used are listed in Appendix A. using them. For example, we take the average value of state value in a certain time interval and compute the difference 3.1 Basic Idea between cumulative value at the same time. As for external Both the search-based approach and the multistep learning- metrics (latency and throughput), we take samples every 5 based approach suffer from some limitations, so we desire seconds and then simply calculate the mean value of sampled to design an efficient end-to-end tuning system. At the same results to calculate reward (which represents how the per- time, we hope that our model can learn well with limited formance of current database will change after performing samples in the initial training and simulate the DBA’s train corresponding knobs in Section 4.2). Note that the DBA also of thought as much as possible. Therefore, we tried the collects the average value for these metrics by executing the RL method. At the beginning, we tried the most classic Q- “show status” command during the tuning tasks. Although learning and DQN models in RL, but both of these methods these values may change over time, the average value can failed to solve the problems of high-dimensional space (data- well describe the database state. This method is intuitive and base state, knobs combination) and continuous action (con- simple, and the experiment also validates its effectiveness. tinuous knobs). Eventually, we adopt the policy-based Deep Deterministic Policy Gradient approach which overcomes 3https://github.com/akopytov/sysbench above shortcomings effectively. In addition, as the soul of 4https://github.com/Percona-Lab/tpcc-mysql RL, the design of reward function (RF) is vital, which directly 4
5It is interesting to study how to wisely discretize the knobs to enable The algorithm contains seven main steps (the pseudo code Q-learning or DQN to support this problem. is summarized in Appendix B.1). 6 'IZUX )XOZOI the opponent will do next). However, in our DBMS tuning, 1TUH)UTLOM[XGZOUTY Ŧ 7 tween knobs. Because of this, with relatively fewer samples, Ţ Ȝ Ţ Ȝ Ȝ Ȝ it is easier to learn the DBMS tuning model than game prob- lems. Secondly, when modeling our CDBTune, we adopt few Ȝ Ȝ input (63 dimensions) and knobs output (266 dimensions), Ȝ Ȝ Ȝ which makes the network efficiently converge without too /TV[Z Ȝ /TV[Z Ȝ /TV[Z Ȝ many samples. Thirdly, RL essentially requires diverse sam- /TZKXTGR3KZXOIY 1TUH)UTLOM[XGZOUTY /TZKXTGR3KZXOIY "/3/3 Ȝ/3T$ 'IZOUT 9ZGZK ples, not only massive samples. For example, RL solves the game problem by processing each frame of the game screen Figure 4: DDPG for CDBTune. to form initial training samples. Each frame time is very short, leading to high training images redundancy. Instead, Step 1. We first extract a batch of transition (st ,rt , at ,st+1) for DBMS tuning, we will change the parameters of database from the experience replay memory. and collect the performance data. These data are diverse in our learning process. RL can constantly update and optimize Step 2. We feed s to the actor network and output the t+1 the performance with these diverse data. Lastly, coupled knob settings a′ to be executed at next moment. t+1 with our efficient reward function, our method performs Step 3. We get the value (score) Vt+1 after sending st+1 and ′ effectively with a small number of diverse samples. at+1 to the critic network. In summary, the DDPG algorithm makes it feasible for Step 4. According to Q-Learning algorithm, Vt+1 is multi- deep neural networks to process high-dimensional states and plied by discount factor γ and added by the value of reward generate continuous actions. DQN is not able to directly map ′ at time t, and now we can estimate the value of Vt of the states to continuous actions for maximizing the action-value current state st . function. In DDPG, the actor can straightforwardly predict Step 5. We feed st (obtained at the first step) to the critic the values for all tunable knobs at the same time without network and further acquire the value Vt of the current state. considering the Q-value with specific action and state. ′ Step 6. We compute the square difference between Vt and Vt and optimize parameter θQ of the critic network by gradient 4.2 Reward Function descent. The reward function is vital in RL, which provides impactful feedback information between the agent and environment. Step 7. We use Q(s = s , µ(s )|θQ ) outputted by the critic net- t t We desire that the reward can simulate the DBA’s empirical work as the loss function, and adopt gradient descent means judgement for real environment in the tuning process. to guide the update of the actor network in order that the Next we describe how CDBTune simulates DBA’s tuning critic network gives a higher score for the recommendation process to design reward functions. First, we introduce DBA’s outputted by the actor network each time. tuning process as follows: To make it easier to understand and implement our al- (1) Suppose that the initial performance of DBMS is D and gorithm, we elaborate the network structure and specific 0 the final performance tuned by DBA is D . parameters values of DDPG in Table 5 of Appendix B.2. Note n (2) DBA tunes the knobs and the performance becomes D that we also discuss the impact of recommended configura- 1 after the first tuning. Then DBA computes the performance tions by different networks on the system’s performance in change ∆(D , D ). Appendix C.2. 1 0 (3) At the i-th tuning iteration, DBA expects that the current Remark. Traditional machine learning methods rely on mas- performance is better than that of the previous one (i.e., Di is sive training samples to train the model while we adopt the better than Di−1 where i < n), because DBA aims to improve try-and-error method to make our model generate diversi- the performance through the tuning. However, DBA cannot fied samples and learn towards an optimizing direction by guarantee Di is better than Di−1 at every iteration. To this using deep reinforcement learning, which can reasonably end, DBA compares (a) Di and D0 and (b) Di and Di−1. If Di use limited samples to achieve great effects. The main rea- is better than D0, the tuning trend is correct and the reward son why we can achieve great result with limited samples is is positive; otherwise the reward is negative. The reward summarized as follows. Firstly, for solving traditional game value is calculated based on ∆(Di , D0) and ∆(Di , Di−1). problems, we cannot evaluate how the environment will Based on the above idea, we model the tuning method change after taking an action, because the environment of of DBAs, which not only considers the change of perfor- game is random (for example, in Go, we do not know what mance at the previous time but also the initial time (when 7 the database needs to be tuned). Formally, let r, T and L de- generate diverse samples and learn towards an optimizing note reward, throughput and latency. Especially, T0 and L0 direction by using deep RL which can reasonably use lim- are respectively the throughput and latency before tuning. ited samples to achieve great effects. (2) High-dimensional We design the reward function as follows. Continuous Knobs Recommendation. The DDPG method At time t, we calculate the rate of performance change ∆ can recommend a better configuration recommendation in from time t − 1 and the initial time to time t respectively. high-dimensional continuous space than simple regression The detailed formula is shown as follows: OtterTune used. (3) End-to-End Approach. The end-to-end Tt − T0 approach reduces the error caused by multiple segmented ∆T t→0 = T0 tasks and improves the precision of recommended configu- ∆T = (4) ration. Moreover, the importance degree of different knobs Tt − Tt−1 ∆T t→t−1 = is treated as an abstract feature which can be perceived and Tt−1 automatically accomplished by deep neural network instead −Lt + L0 of using an extra method to rank the importance degree of ∆Lt→0 = L0 ∆L = (5) different knobs (see Section 5.2.2). (4) Reducing the Possi- −Lt + Lt−1 bility of Local Optimum. CDBTune may not find the global ∆Lt→t−1 = L − optimum, but RL adopts the well-known exploration & ex- t 1 According to Eq. (4) and Eq. (5), we design the reward ploitation dilemma, which not only gives full play to the function below: ability of the model, but also explores more optimal configu- ( 2 rations that the DBA never tried, reducing the possibility of ((1 + ∆t→0) − 1)|1 + ∆t→t−1|,∆t→0 > 0 r = (6) trapping in local optimum. (5) Good Adaptability. Different −(( − )2 − )| − | 1 ∆t→0 1 1 ∆t→t−1 ,∆t→0 ⩽ 0 from supervised or unsupervised learning, RL has the ability Considering that the ultimate goal of tuning is to achieve to simulate the human brain to learn as much as possible in better performance than the initial settings, we need to re- a reasonable direction from experience rather than a specific duce the impact of the intermediate process of tuning for value, which does not depend on labels or training data too designing reward function. Therefore, when the result in much and adapts to different workloads and hardware con- Eq. (6) is positive and ∆t→t−1 is negative, we set the r = 0. figurations in cloud environment with a good adaptability. According to the above steps we can calculate the re- Note some other ML solutions can be explored to improve wards of throughput and latency rT and rL. Then we multiply the database tuning performance further. these two rewards by different coefficients CL and CT , where CL + CT = 1. Note that the coefficients can be set based on 5 EXPERIMENTAL STUDY user preferences. We use r to denote the sum of rewards of In this section, we evaluate the performance of CDBTune and throughput and latency: compare with existing approaches. We first show the exe- r = CT ∗ rT + CL ∗ rL (7) cution time of our model, then compare with baselines and show the performance of varying tuning steps. Then, we If the goal of optimization is throughput and latency, our compare the performance of CDBTune with BestConfig [55], reward function does not need to change, because the reward OtterTune [4], and 3 DBA experts who have been engaged function is independent to the hardware environment and in tuning and optimizing DBMSs for 12 years in Tencent. Fi- workload changes, and it only depends on the optimization nally, we verify the adaptability of the model under different goal. Thus, the reward function needs to be redesigned only conditions. We also provide more experiments in Appen- if the optimization goals are changed. dix C, including different reward functions, different neural Note other reward functions can be integrated into our networks and other databases. system. For verifying the superiority of our designed reward function in the training and tuning process, we compare it Workload. Our experiments are conducted using three kinds with other three typical reward functions in Appendix C.1.1. of benchmark tools: Sysbench, MySQL-TPCH and TPC-MySQL. We carry out the experiments with 6 workloads consisting of Moreover, we explore how the two coefficients CL and CT will affect the performance of DBMS in Appendix C.1.2. read-only, write-only and read-write workloads of Sysbench, TPC-H6 workloads, TPC-C workloads and YCSB workloads, 4.3 Advantages which is similar to existing work. Under Sysbench work- loads, we set up 16 tables of which each contains about 200K The advantages of our method are summarized as follows. (1) records (about 8.5GB) and the number of threads is set to Limited Samples. In the absence of high-quality empirical 1500. For TPC-C (OLTP), we select the database consisting data, accumulating experience via try-and-error method re- duces the difficulty of data acquisition and makes our model 6http://www.tpc.org/tpch 8 Table 1: Database instances and hardware configuration. servers) which greatly reduces the offline training time (we do not use parallel computing for online tuning). Instance RAM (GB) Disk (GB) CDB-A 8 100 5.1 Efficiency Comparison CDB-B 12 100 CDB-C 12 200 We first evaluate the execution time details of our method, CDB-D 16 200 then compare with baselines and show the performance of CDB-E 32 300 varying tuning steps. CDB-X1 (4, 12, 32, 64, 128) 100 CDBTune In order to know CDB-X2 12 (32, 64, 100, 256, 512) 5.1.1 Execution Time Details of . how long a step will take in the training and tuning process, we record the average runtime of each step. The runtime for of 200 warehouses (about 12.8GB) and set the number of each step is 5 minutes, which is mainly divided into 5 parts concurrent connections to 32. TPC-H (OLAP) workloads (excluding 2 minutes to restart the CDB) as follows: contain 16 tables (about 16GB). For YCSB (OLTP), we gen- (1) Stress Testing Time (152.88 sec): Runtime of workload erate 35GB data using the threads of 50 and operations of generator for collecting current metrics of the database. 20M. In addition, read-only, write-only and read-write of (2) Metrics Collection Time (0.86 ms): Runtime of obtain- Sysbench are abbreviated as RO, WO and RW respectively. ing state vectors from internal metrics and calculating reward When we conduct online tuning using the model trained on from external metrics. another condition, the expression is defined as M_{training (3) Model Update Time (28.76 ms): Runtime of forward → condition} {tuning condition}. For example, we use 8GB computation and back-propagation in the network during RAM as a training setting and use it for online tuning on one training process. → 12GB RAM, and then we denote it as M_8G 12G. (4) Recommendation Time (2.16 ms): Runtime from in- DBA Data. It is hard to collect a large amount of DBA’s putting the database state to outputting recommended knobs. experience tuning data. In order to reproduce and make a (5) Deployment Time (16.68 sec): Runtime from outputting comparison with OtterTune [4], we utilize all the DBA’s ex- recommended knobs to deploying the configurations accord- perience data we accumulated as well as the training data ing to CDB’s API interface. CDBTune we used on to train OtterTune. The proportion of Offline Training. CDBTune takes about 4.7 hours for 266 these two kinds of data is about 1:20. For example, consider- knobs and 2.3 hours for 65 knobs in offline training. Note CDBTune ing recommending the configuration for 266 knobs, that the number of knobs affects the offline training time but collects 1500 samples, while besides these samples OtterTune will not affect the online tuning time. will also use 75 historical experience tuning data of the DBA. Online Tuning. For each tuning request, we run CDBTune CDBTune 7 Setting. Our is implemented using PyTorch and in 5 steps, and the online tuning time is 25 minutes. Python tools including scikit-learn library8. All the experi- ments are run on Tencent’s cloud server with 12-core 4.0GHz 5.1.2 Comparison with Baselines. We compare the online CPU, 64GB RAM and 200GB Disk. We use seven types of tuning efficiency of CDBTune with OtterTune [4], BestCon- CDB’s instances in the evaluations and their hardware con- fig [55] and DBA. Note that only CDBTune requires offline figurations are shown in Table 1. The difference ismainly training. But it trains the model once and uses the model to reflected in the memory size and disk capacity. For fair com- do online tuning, while OtterTune requires to train the model parison, in all experiments, we select the best result of the for every online tuning request and BestConfig requires to recommendations of CDBTune and OtterTune in the first do online search. As shown in Table 2, for each tuning re- 5 steps. Comprehensively, considering that BestConfig (a quest, OtterTune takes 55 minutes, BestConfig takes about search-based method) needs to restart the search each time 250 minutes, DBAs take 8.6 hours, while CDBTune takes 25 which will take a lot of time, we give 50 steps in the experi- minutes. Note that we invite 3 DBAs to tune the parame- ment. When the 50 steps are finished, we suspend BestConfig ters and select the best performance of their results which and use the recommended configuration corresponding to take 8.6 hours for each tuning request. (We have recorded its best performance. Last but not least, to improve the of- 57 tuning requesters from 3 DBAs, which totally took 491 fline training performance, we add the method of priority hours.) It takes DBA about 2 hours to constantly execute experience replay [38] to accelerate the convergence, which workload replay and detect the factors (e.g., analyzing the increases the convergence speed by a factor of two (half the most time-consuming functions in the source code, then lo- number of iterations). We also adopt parallel computing (30 cating the reason, and finding the corresponding knobs to tune) that affect the performance of DBMS. This process usu- 7https://pytorch.org ally requires rich experience and a lot of time cost. OtterTune 8http://scikit-learn.org/stable adopts simple GP regression, and the knobs recommended 9 Table 2: Detailed online tuning steps and time of CDBTune and other tools. 1350 Tuning Tools Total Steps Time of One Step (mins) Total Time (mins) 1100 850 CDBTune 5 5 25 600 OtterTune 5 11 55 350 BestConfig 50 5 250 (txn/sec) Throughput 0 20 40 60 80 100 120 140 160 180 200 220 240 260 266 Number of Knobs DBA 1 516 516 5200 4800 4400 4000 3600 3200 99th(ms)%-tile 2800 0 20 40 60 80 100 120 140 160 180 200 220 240 260 266 Number of Knobs 6000 Figure 6: Performance by increasing number of knobs 5000 (knobs sorted by DBA). 4000 3000 1700 2000 1400 Throughput (txn/sec) Throughput 1100 1000 650 5 10 15 20 25 30 35 40 45 50 Steps 350 Throughput (txn/sec) Throughput 0 20 40 60 80 100 120 140 160 180 200 220 240 260 266 Number of Knobs 1800 5050 4650 1600 4250 3850 1400 3450 3050 1200 99th(ms)%-tile 2650 0 20 40 60 80 100 120 140 160 180 200 220 240 260 266 1000 Number of Knobs 800 99th(ms)%-tile Figure 7: Performance by increasing number of knobs 600 (knobs sorted by OtterTune). 400 5 10 15 20 25 30 35 40 45 50 Steps Figure 5: Performance by increasing number of steps model gradually adapts to the current workload through fine- tuning operations as the number of steps increases, which are not accurate. Therefore, it has to try more trials to achieve brings continuous improvement to the performance. Also, a better performance. BestConfig restarts the entire search compared with OtterTune and DBA, CDBTune has already processing whenever a new tuning request comes, and fails achieved a better result in the first 5 steps in all cases, indi- to utilize knowledge gained from previous tuning efforts, cating that our model owns high efficiency, so we believe thus it requires a lot of trial time. our algorithm really draws lessons from past experience 5.1.3 Varying Tuning Steps. When recommending config- and performs excellently. Certainly, the user will get better urations for the user, we need to replay the current user’s configurations to achieve higher performance if accepting a workload and conduct stress testing on the instance. In this longer tuning time. However, OtterTune keeps stable with process, we fine-tune the standard model with limited steps the increasing number of iterations, which is caused by the called accumulated trying steps. Hence, how many steps will characteristics of supervised learning and regression. it take to achieve the desired performance? In most cases, the algorithm will bring better results. However, due to the 5.2 Effectiveness Comparison exploration & exploitation dilemma in DDPG, such knob In this section, we evaluate the effect on different number settings that the past experience never tried will be obtained of knobs and discuss the performance of CDBTune, DBA, with a small probability. These outliers may either degrade OtterTune and Bestconfig with different workloads. With or improve performance to an unprecedented level. We set 5 the database instance CDB-B, we record the throughput, steps as a statistical increment unit to observe the system’s latency, and number of iterations for TPC-C workload when performance and take the recommended result correspond- the model converges. Note that some knobs do not need ing to the optimal performance. We carry out experiments to be tuned, e.g., those knobs that do not make sense (e.g., with CDB-A on three kinds of Sysbench’s workloads respec- path names) to tune or those that are not allowed to tune tively as shown in Figure 5, where the horizontal coordinate (which may lead to hidden or serious problems). Such knobs represents the number of tuning steps before recommend- are added to the black-list according to the DBA or user’s ing configurations while the vertical ordinate represents the demand. We finally sort 266 tunable knobs (the maximum value of throughput or latency. It is obvious that the standard number of knobs that the DBA uses to tune for CDB). 10 1600 5.2.1 Knobs Selected by DBA and OtterTune. Both DBA 1300 1000 and OtterTune rank the knobs based on their importance CDBTune Throughput 700 to the database performance. We use their orders to sort (txn/sec) Throughput 0 20 40 60 80 100 120 140 160 180 200 220 240 260 266 the 266 knobs and select different numbers of knobs fol- Number of Knobs lowing the order to tune and compare different methods. 4300 CDBTune Latency 4000 Figures 6 and 7 show the performance with CDB-B under 3700 3400 TPC-C workload based on the orders of DBA and OtterTune 3100 99th(ms)%-tile 2800 respectively. We can see from the results that CDBTune can 0 20 40 60 80 100 120 140 160 180 200 220 240 260 266 Number of Knobs achieve better performance in all cases. Note that the perfor- 1680 mance of DBA and OtterTune begins to decrease after their 1380 CDBTune Iteration 1080 recommended knobs exceed a certain number. The main 780 reason is that the unseen dependencies between knobs be- 480 180 Number of Iteration of Number 0 20 40 60 80 100 120 140 160 180 200 220 240 260 266 come more complex in a larger knobs space, but DBA and Number of Knobs OtterTune cannot recommend reasonable configurations in Figure 8: Performance by increasing number of knobs such high-dimensional continuous space. This experiment (knobs randomly selected by CDBTune). CDBTune demonstrates that is able to recommend better con- Table 3: Higher throughput (T) and lower latency (L) of figurations than DBA and OtterTune in high-dimensional CDBTune than BestConfig, DBA and OtterTune. continuous space. BestConfig DBA OtterTune Workload 5.2.2 Knobs Randomly Selected by CDBTune. CDBTune ran- TLTLTL domly selects different numbers of knobs (note that the 40 RW ↑ 68.28% ↓ 51.65% ↑ 4.48% ↓ 8.91% ↑ 29.80% ↓ 35.51% selected knobs must contain the 20 selected knobs from the RO ↑ 42.15% ↓ 43.95% ↑ 4.73% ↓ 11.66% ↑44.46% ↓ 23.63% previous one) and record the performance of CDB-B under WO ↑ 128.66% ↓ 61.35% ↑ 46.57% ↓ 43.33% ↑ 91.25% ↓ 59.27% TPC-C workload after executing these configurations. As shown in Figure 8, as the number of knobs increases from than BestConfig. Consequently, the learning-based method is 20 to 266, the performance of configurations recommended more effective and our algorithm obtains the state-of-the-art by CDBTune is continuously improved. The performance is result. Besides, OtterTune performs inferior to the DBA in poor at the beginning, because a small number of selected most cases. This is because we use the try-and-error samples knobs have a small impact on performance. Besides, after the in RL instead of massive high-quality DBA’s experience tun- number of knobs reach a certain number, the performance ing data. Compared with BestConfig, we find that CDBTune tends to be stable. This is also because the added knobs later greatly outperforms it, because in a short time, BestConfig will not greatly affect the performance. This experiment can hardly find the optimal configurations without any past demonstrates that DBA and OtterTune separately rank the experience in a high-dimensional space. This verifies that importance of knobs, but our CDBTune automatically com- the learning-based approach has overwhelming predomi- plete this process (feature extraction) by deep neural network nance in achieving better solution quickly than search-based without additional ranking step (DBA and OtterTune need), tuning, and also verifies the superiority of CDBTune. which is also in line with our original intention of designing CDBTune is able to achieve better performance than other end-to-end model. candidates, especially gains a remarkable improvement un- In addition, the input and output of the network become der the write-only workload. We observe except that the larger as the number of knobs increases, resulting that CDBTune buffer pool size is enlarged, configurations which CDBTune takes more steps in the offline training process. Therefore, we recommended also expand the size of log file properly. In use the method of priority experience replay and adopt par- addition, innodb_read_io_threads will increase under the allel computing to accelerate the convergence of our model. RO workload while both innodb_write_io_threads and inn- According to the time cost of each step mentioned in sec- odb_purge_threads are becoming appropriately larger when tion 5.1, the average time spent on offline training is about the workload is WO or RW. This shows that our model can 4.7 hours. This time will be further shortened if GPU is used. properly tune knobs under different workloads, which im- prove the CPU utilization as well as the database perfor- 5.2.3 Performance Improvement. We also evaluate our mance. Of course, for those knobs like innodb_file_per_table, method on different workloads with CDB-A and the result max_binlog_size and skip_name_resolve, we obtain the same is shown in Figure 9. The tuning performance improvement values as the DBA advisors. According to the MySQL official percentage is shown in Table 3 compared with BestConfig, manual regulations, the product of innodb_log_files_in_group DBA and OtterTune. It can been seen that CDBTune achieves and innodb_log_file_size is not allowed to be greater than higher performance than OtterTune, which in turn is better the value of disk capacity. Also, we find that during the real 11 20.0K 1600 1400 17.5K 1200 15.0K 1000 12.5K 800 10.0K 600 7500 400 5000 99th(ms)%-tile M M M M M 200 2500 Throughput (txn/sec) Throughput 0 0 (a) RW (Throughput) (b) RW (99%-tile Latency) 3000 4000 2500 3500 3000 2000 2500 1500 2000 M M M M M 1500 1000 1000 500 99th(ms)%-tile 500 Figure 10: Performance comparison for Sysbench WO Throughput (txn/sec) Throughput 0 0 workload when applying the model trained on 8G memory to (X)G memory hardware environment. (c) RO (Throughput) (d) RO (99%-tile Latency) 6000 10.0K 5000 8000 resources need to be adjusted. Usually, memory size and disk 4000 6000 capacity are the most two properties that users prefer to 3000 adjust. Thus, in the cloud environment, so many different 4000 2000 users own their respective cloud database memory size and 1000 99th(ms)%-tile 2000 Throughput (txn/sec) Throughput disk capacity that we are not able to build a correspond- 0 0 ing model for each one. Therefore, this cloud environment (e) WO (Throughput) (f) WO (99%-tile Latency) naturally requires the DBMS tuning models own good adapt- Figure 9: Performance comparison for Sysbench RW, RO ability. In order to verify that CDBTune can greatly optimize and WO workload among CDBTune, MySQL default, BestCon- the database’s performance with different hardware config- fig, CDB default, DBA and OtterTune. urations, we use database instance CDB-A (8G RAM, 100G Disk), CDB-X1 (XG RAM, 100G Disk) where X is selected training process of our model, the CDB’s instance will easily from (4, 12, 32, 64, 128), CDB-C (12G RAM, 200G Disk) and crash once the product exceeds the threshold, because the CDB-X2 (12G RAM, XG Disk) where X is selected from (32, log files take up too much disk space, resulting that more 64, 100, 256, 512). Note that there is only a different memory data cannot be written. An interesting finding is that faced size between CDB-A and CDB-X1 while just a disk capacity with this situation, we do not limit the range of these two difference between CDB-C and CDB-X2. For different mem- parameters but give a large negative reward (e.g., -100) for ory size, under write-only workloads, we first directly utilize punishment. Instead, the practical results verify this method the model called M_A→X1 trained on CDB-A to recommend achieves a good effect with the constant reward feedback in configurations for CDB-X1 (cross testing), then use the model RL and this situation is becoming less and less, and even dis- called M_X1→X1 trained on CDB-X1 to recommend config- appears as the training process goes on, although the crash urations for CDB-X1 (normal testing), and finally compare may frequently occur in the initial training. Ultimately, the the performance after executing these two configurations. product of the two values recommended by our model is Similarly, for different disk capacity, we utilize the same reasonable which will not result in the crash. way to complete cross testing and normal testing on CDB- C and CDB-X2. As shown in Figure 10 and Figure 11, the cross-testing model almost achieves the same performance as 5.3 Adaptability normal-testing model. Moreover, both of above two models We evaluate the adaptability, e.g., how a method can adapt achieve better performance than OtterTune, BestConfig and to new environment or new workload. the DBAs employed by Tencent’s cloud database, indicating 5.3.1 Adaptability on Memory Size and Disk Capacity that our CDBTune does not need to establish a new model Change. Compared with local self-built databases, one of and owns a strong adaptability that can completely adapt to the biggest advantages for cloud databases is that data mi- a new hardware environment no matter how memory size, gration or even downtime reloading is hardly required when disk capacity of users change. 12 BestConfig do not search high-dimensional configuration