Smart Build Targets Batching Service at Google

Smart Build Targets Batching Service at Google Kaiyuan Wang Daniel Rall Greg Tener Google Inc Google Inc Google Inc [email protected] [email protected] [email protected] Vijay Gullapalli Xin Huang Ahmed Gad Google Inc Google Inc Google Inc [email protected] [email protected] [email protected] Abstract—Google has a monolithic codebase with tens of as it exceeds the memory of a single machine for dependency millions build targets. Each build target specifies the information analysis. Moreover, Google’s infrastructure limits the number that is needed to build a software artifact or run tests. It is of executors, e.g. CPU, GPU and TPU, a build can use in common to execute a subset of build targets at each revision and make sure that the change does not break the codebase. Google’s parallel at the build execution time. So executing too many build service system uses Bazel to build targets. Bazel takes as targets in a build may cause the build to take too long and input a build that specifies the execution context, flags and build exceed its invocation deadline, resulting in deadline exceeded targets to run. The outputs are the build libraries, binaries or (DE) errors. In contrary, executing too few targets in a build test results. To be able to support developer’s daily activities, the may use up all the machines allocated to run builds. build service system runs millions of builds per day. It is a known issue that a build with many targets could run out In this paper, we propose a build target batching service of the allocated memory or exceed its execution deadline. This (BTBS) that partitions a large stream of targets into batches is problematic because it reduces the developer’s productivity, and creates a build for each batch of targets such that those e.g. code submissions or binary releases. In this paper, we builds do not have OOM or DE errors. The technique relies on propose a technique that predicts the memory usage and executor a memory estimation model and an executor occupancy esti- occupancy of a build. The technique batches a set of targets such that the build created with those targets does not run out mation model. The memory model predicts the peak memory of memory or exceed its deadline. This approach significantly usage of a build. The occupancy model predicts the average reduces the number of builds that run out of memory or exceed executor occupancy of a build. The technique partitions a large the deadlines, hence improving developer’s productivity. stream of targets into target batches such that the build with the flags and each batch of targets consumes a limited memory I. INTRODUCTION or executor occupancy. The results show that BTBS generates Google has a monolithic codebase and it grows rapidly with few OOM and DE builds with 0.08% OOM rate and 0.05% an average of more than 100,000 code submissions per day. DE rate, which saves a lot of computational resources used To make sure that the new changes do not break the existing for build failure retries. codebase, Google has adopted Continuous Integration (CI) [7], The paper makes the following contributions: [11]. For each code change, a CI system first uses the build • It presents the first technique that effectively creates builds service system [14] to build the affected libraries and binaries. from a large stream of targets with the goal of minimizing Then, it runs the affected tests to check if the built libraries the number of OOM and DE errors. It is also the first and binaries are working as intended. technique that predicts memory and occupancy usages of Google’s build service system uses Bazel [1] to build build executions. libraries and run tests. Bazel takes as input a set of build arXiv:2102.09098v1 [cs.SE] 18 Feb 2021 • It demonstrates that the technique is able to reduce the OOM specification files that declare build targets. We refer to a build rate to 0.08% and the DE rate to 0.05%. Our past experience target as a target in the rest of this paper. A target specifies shows that BTBS is critical and improves the developer’s what is needed to produce an artifact, such as a library or productivity by reducing build failure retries due to OOM binary. A test target specifies what is needed to run tests and or DE errors. check if a code change breaks the codebase. Bazel decides how to build a given target based on the target’s specification. II. BACKGROUND As the codebase grows, each change may affect a lot of targets. For example, a code change on a common library like A. Bazel Guava [4] could affect many Java targets. A C++ compiler Bazel is an open-source build and test tool and is widely change could affect all C++ targets. Moreover, the postsubmit used within Google. It is responsible for transforming source service needs to guarantee that all targets in a given change code into libraries, executable binaries, and other artifacts. revision compile and pass all tests. The above use cases Bazel takes as input a set of flags and targets that programmers are common and require Bazel to build up to millions of declare in build files. It supports a large number of command targets at once. Running a large number of targets in a build line options and these options can affect the way Bazel execution may cause out of memory (OOM) errors in Bazel generates outputs. TABLE I: Example build flags Category Example Error checking –check_visibility enables checking if all dependent targets are visible to all targets to build. Tool flags –copt specifies the argumentsto be passed to the C++ compiler. Build semantics –cpu specifies the target CPU architecture to be used for the compilation. Execution strategy –jobs specifies a limit on the number of concurrently running executors during build execution. Output selection –fuseless_output restricts Bazel to generate intermediate output files in memory. Platform –platforms specifies the labels of the platform rules describing the target platforms. Miscellaneous –use_action_cache enables Bazel’s local action cache. java_library( java_library( java_test( name = "HelloWorld", name = "HelloWorldTest", name = "AllTests", srcs = ["HelloWorld.java"], srcs = size = "small", ) ["HelloWorldTest.java"], tags = ["requires-gpu"], deps = [":HelloWorld"], deps = [":HelloWorldTest"], ) ) Fig. 1: Example Java targets BTBS splits a stream of targets into batches and creates a build for each batch of targets. Those builds are enqueued to the scheduling service. The scheduling service waits until there are available resources and dequeues a build to a Bazel worker for execution. The worker allocates a fixed amount of memory for each Bazel process and a build runs out of memory if the Bazel process uses more than the allocated memory during Fig. 2: Build service architecture execution. Bazel translates the build flags and targets into Table I shows different flag categories with examples. actions, and sends those actions to the executor cluster for For example, the -fuseless_output flag restricts Bazel to the actual execution. For example, if a test requires GPU then generate intermediate output in memory instead of writing it Bazel will send it to the GPU executors. The executor cluster to the disk. The -jobs flag specifies a limit on the number of has a large but limited number of executors and each executor concurrently running executors during a build execution. So talks to a Bazel process to execute actions. The executor -fuseless_output and -jobs can significantly affect the cluster also has an action cache to minimize duplicate work. memory usage and executor occupancy of a build, respectively. Each Bazel process is configured to use a limited number of Figure 1 shows some example Java targets. HelloWorld is executors concurrently to avoid the case where a very large a Java library target that compiles HelloWorld.java into a build uses a lot of executors and blocks other build executions. library (java_library is the target rule). HelloWorldTest As a consequence, executing a large build that could use more is a Java library target that compiles HelloWorldTest.java executors than the limit causes the additional actions to queue into a library and it depends on the HelloWorld target until some executors become idle again. This may cause the because HelloWorldTest.java uses HelloWorld.java. build to exceed its deadline and we call these builds Type AllTests is a Java test target that runs the HelloWorldTest I DE builds. It is also possible that some actions depend on library using JUnit and the execution requires GPU. When other actions and they must be executed in sequence. This a programmer issues a command to build a target, Bazel may cause build to exceed its deadline if some actions are first ensures that the required dependencies of the target are long running. We call these builds Type II DE builds. An built. Then, it builds the desired target from its sources and action uses both memory and executor in the executor cluster. dependencies. When a programmer issues a command to run We use an executor service unit (ESU) to unify the expense a test target, Bazel will first build all dependencies of the test of both executor memory and CPU, and 1 ESU is equal target and then execute the tests. to 2.5GB of memory or 1 executor. Note that the executor memory usage is different from the Bazel memory usage. We B. Build Service System refer to the occupancy usage as the ESU used by a build in BTBS takes as input a set of flags and a stream of targets, the rest of the paper.

Load more