Cloudbuild: Microsoft's Distributed and Caching Build Service

CloudBuild: Microsoft’s Distributed and Caching Build Service Hamed Esfahani, Jonas Fietz, Qi Ke, Alexei Kolomiets, Erica Lan, Erik Mavrinac, Wolfram Schulte, Newton Sanches, Srikanth Kandula Microsoft ABSTRACT ● execuTe builds, Tests, and relaTed Tasks as fasT as possible, ● _ousands of Microso engineers build and TesT hundreds of so- on-board as many producT groups as eòorTlessly as possible, ● ware products several Times a day.IT is essenTial ThaTThis con- inTegraTe inTo exisTing workows, ● Tinuous inTegraTion scales, guaranTees shorT feedback cycles, and ensure high reliabiliTy of builds and Their necessary infras- funcTions reliably wiTh minimal human inTervenTion. _is paper TrucTure, ● describes CvÕnBnTv, The build service infrasTrucTure developed reduce costs by avoiding separaTe build labs per organizaTion, ● wiThin Microso over The lasT few years. CvÕnBnTv is responsi- leverage Microso’s resources in The cloud for scale and elas- ble for all aspects of a conTinuous inTegraTion workow, including TiciTy, and lasTly ● builds, TesT and code analysis, as well as drops, package and sym- consolidaTe disparaTe engineering eòorts inTo one common bol creaTion and sTorage. CvÕnBnTv supports mulTiple build lan- service. guages as long as They fulûll a coarse grained, ûle IO based conTracT. We found iT diõculTTo simulTaneously saTisfy These requirements. CvÕnBnTv uses conTenT based caching To run build-relaTed Tasks MosT build labs, for insTance, have only a few dependencies on oTher only when needed.LasTly, iT builds on many machines in parallel. sysTems; They connecTTo one version conTrol sysTem or build drop CvÕnBnTv oòers a reliable build service in The presence of un- sTorage soluTion. To avoid separaTe build labs,CvÕnBnTv sup- reliable components.IT aims To rapidly onboard Teams and hence ports The union of all lab dependencies and workows. However, The has To supporT non-deTerminisTic build Tools and speciûcaTion lan- consequenT inTeroperabiliTy issues can lead To lower reliabiliTy. As guages ThaT under-declare dependencies. We will ouTline how we ad- anoTher insTance of conicTing requirements, producT groups have dressed These challenges and characTerize The operaTions of CvÕn- opTimized Their builds (and build labs) over many years, oen ex- BnTv. CvÕnBnTv has on-boarded hundreds of codebases wiTh ploiTing Their parTicular producT archiTecTure and workow To opTi- only man-monThs of eòorT each. Some of These codebases are used mize for incremenTal and parallel builds. Since CvÕnBnTv oòers by Thousands of developers. _e speed ups of build and TesT range a generic cloud-based soluTion across The various producT groups, from Ï.k× To Ïª×, and service availabiliTy is ì. our currenT speedup improvements are less dramaTic (noT Ïªª× for example) relaTive To The highly opTimized cusTom sysTems. Keywords When archiTecTing CvÕnBnTv, we had To make a crucial de- Build SysTems,CloudBuild, DisTribuTed SysTems, Caching, Speciû- cision. We found ThaT exisTing build Tools and speciûcaTion lan- caTion Languages, OperaTions guages were unsuiTable for cached and parallel execuTion. In parTic- ular, The languages would under-specify dependencies and The Tools 1. INTRODUCTION produced non-deTerminisTic ouTputs.Hence, while single machine (or single Threaded) execuTion would run correcTly, disTribuTed ex- Microso is rapidly adopTing an agile meThodology To enable a ecuTion and caching of Task ouTputs could lead To unpredicTable be- much fasTer delivery cadence of days To weeks. For insTance, Oõce havior or limiTed speedup. A clean soluTion would be To design a kC, visual STudio Online and SQL Azure have release cycles ranging new build speciûcaTion language, rewriTe all build speciûcaTions in from k monThs To k weeks. On The exTreme end, Bing’s websiTe ships This language and force Tools To respecT addiTional consTraints. Such several Times a day. eòorts are underway [, }9]. CvÕnBnTv, however, is our quick Delivering rapidly requires ThaT builds, which are aTThe core of soluTion. _aT is, here we show how To onboard exisTing speciûca- The inner loop of any engineering sysTem, are fasT and reliable. _is Tion languages (and Tools) on To a cloud-based parallel and cached challenge has given us an opporTuniTy To design a new in-house in- build sysTem. _e key advanTage wiTh This choice is The speed of on- frasTrucTure for conTinuous inTegraTion known as CvÕnBnTv.Its boarding. Builds and Tests can be sped up and disparaTe labs can be design was moTivaTed by These needs: consolidaTed inTo The cloud quickly. We also found CvÕnBnTv To Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed be able To cover a wide range of Tools and speciûcaTions. _e cosT, for profit or commercial advantage and that copies bear this notice and the full citation however, is The raTher heurisTic naTure of CvÕnBnTv: our goal is on the first page. Copyrights for components of this work owned by others than the noTTo funcTion correcTly in every possible seTTing (of speciûcaTion author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or and Tool behavior). We miTigaTe This wiTh cusTomer educaTion and republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. hardening of CvÕnBnTv To cover frequenTly recurring issues. ICSE ’16 Companion, May 14 - 22, 2016, Austin, TX,USA We believe ThaT our design decision has been successful. CvÕn- © 2016 Copyright held by the owner/author(s). Publication rights licensed to ACM. BnTv is currenTly being used by more Than ªªª developers in ISBN 978-1-4503-4205-6/16/05. $15.00 Bing,Exchange, SQL, OneDrive, Azure and Oõce.IT execuTes more DOI: hTTp://dx.doi.org/Ïª.ÏÏ/}ÊÊÏCª.}ÊÊ}}} Than }ªK builds per day on up To ÏªK machines spread over several daTa cenTers. Build and TesT speeds Typically improved by Ï.k Times To Microso’s engineering fabric. Ïª Times compared To earlier lab based sysTems, susTaining an avail- In The resT of This paper, we describe The design of CvÕn- abiliTy of .ì. ReliabiliTy, deûned as The number of requests ThaT BnTv (§}–§k). ExperienTial anecdoTes are oòered ThroughouT The are served wiThouT a CvÕnBnTv inTernal error, is beTTer Than ì. design secTions To clarify The various design choices. We Then char- CvÕnBnTv connects wiTh Three diòerenT version conTrol and four acTerize some operaTional aspects of The CvÕnBnTv service (§.Ï) binary sTorage sysTems. REST APIs and Azure’s Service Bus [k] are and share onboarding and live-siTe experiences (§.}–§.k). We con- used for inTegraTion inTo mulTiple workow sysTems. clude wiTh a discussion of relaTed work (§) and lessons learnT. While CvÕnBnTv’s service design follows exisTing designs for large scale daTa-parallel sysTems [k,Ê, Ï], CvÕnBnTv’s build en- 2. PRIMER ON BUILD SERVICES gine is unique in ThaT iT allows The execuTion of arbiTrary build lan- Today, build sysTems such as Make, Maven, Graddle and MSBuild guages and Tools, even in The presence of under-speciûed dependen- execuTe on one machine, Typically The developers’ deskTops or lap- cies and non-deTerminisTic Tools. Tops. As The size of code bases grow, These Tools can Take several Given a build requesT,CvÕnBnTv evaluaTes build speciûca- hours for a full build. DelTa builds, ThaT is, rebuilding aer changes, Tions, Typically expressed in Make or MsBuild ûles, To infer projecT- are fasTer. However, conTinuous inTegraTion workows oen require To-projecT level dependencies. _is means ThaT CvÕnBnTv’s uniT packaging binaries and/or running exhausTive baTTeries of Tests. of scheduling is a coarse grained projecT. While noT as ûne grained To speed up The build workow, we noTe ThaTThere is subsTanTial as oTher sysTems like Google’s Bazel [], The approach allows greaTer opporTuniTy To parallelize. While dependencies do exisT, They only reuse of pre-exisTing build speciûcaTions, wiThouT having To compre- form a parTial order over The build Tasks and many build Tasks are hend all The runTime semanTics of The underlying build Tools. unordered wiTh respecTTo each oTher. As a consequence various ef- CvÕnBnTv can deTermine wheTher To execuTe a given projecT forts are under way To provide mulTi-Threaded builds and disTribuTe or To reuse ouTputs capTured during an earlier execuTion, and sTored The build Tasks over many nodes. in a sysTem-wide cache. _e deTerminaTion is based on The evalua- FurThermore, There is an opporTuniTy To reuse The ouTputs of Tion of The perTinenT source ûles and parenT projects which is sum- previous builds, similar To delTa builds. Build reuse arises funda- marized inTo a cache lookup key for The projecT. _e approach To menTally due To highly modularized soware engineering pracTices. compuTe This key ensures ThaTThe build ouTpuT is deTerminisTic and Teams of developers work on speciûc components and can reuse consisTenT wiTh oTher projects in The same build. The laTesT binaries for The resT. Even across source conTrol branches, Finally CvÕnBnTv schedules work, i.e. The Tasks for each There is a high likelihood of sharing due To The use of shared libraries projecTThaT needs To be builT, in a locaTion-aware manner over mul- for commonly used or imporTanT funcTionaliTy. Tiple worker machines. LocaliTy here implies Two Things: fresh- In This conTexT,CvÕnBnTv oòers a disTribuTed and caching ness of The cache aTThe machine as well as The freshness of Their build service. We oòer a quick summary of relevanT background. enlisTmenT from source conTrol. WiThouT This localiTy, The build- preparaTion-Time To conûgure The machine wiTh The righT seT of 2.1 Source Control SDKs and sources before iT can parTicipaTe in The build will be sub- sTanTial.

Load more