A Roundtable with Three Release Engineers
Total Page:16
File Type:pdf, Size:1020Kb
FOCUS: GUEST EDITORS’ INTRODUCTION The Practice and Future of Release Engineering A Roundtable with Three Release Engineers Bram Adams, Polytechnique Montréal // Stephany Bellomo, Software Engineering Institute // Christian Bird, Microsoft Research // Tamara Marshall-Keim, Software Engineering Institute // Foutse Khomh, Polytechnique Montréal // Kim Moir, Mozilla RELEASE ENGINEERING focuses manager; and Kim Moir, a release us to gauge the overall release health on building a pipeline that transforms engineer. We asked each of them of a service; the third measure can source code into an integrated, com- the same questions covering topics uncover issues in the testing pipe- piled, packaged, tested, and signed such as release- engineering metrics, line or growing code complexity. We product that’s ready for release. The continuous delivery’s bene ts and track these metrics and make com- pipeline’s input is the source code de- limitations, the required job skills, parisons from quarter to quarter. velopers write to create a product or the required changes in education, Related to testing, another metric modify an existing one. Enterprises and recommendations for future is the greenness of the testing pipe- running large-scale websites and de- research. line. Many tests, from code to perfor- livering mobile applications with mil- mance tests, are run daily in a con- lions of users must rely on a robust Every product release must meet an tinuous fashion. Stability of tests is a release pipeline to ensure they can expected level of quality, and release signal of product maturity and good deliver and update their products to processes undergo continual ne- engineering practices. Despite some new and existing customers, at the tuning. What metrics do you use to arguments to the contrary, this mea- required release cadence. monitor a release’s quality? Do you sure effectively increases the velocity This special issue provides an roll back broken releases after de- of product development and release. overview of research and practi- ployment? If so, how? We also track a host of more ne- tioner experience, and this article in grained metrics. Every step—with particular aims to give you insight Debic: Our main measures are its duration, outcome, operation, into the state of the practice and the threefold: the number of open bugs logs, arguments, and other relevant challenges release engineers face. It ranked by priority, the number and details in execution and setup—is features highlights from interviews percentage of successful releases, and logged for every release that runs with Boris Debic, a privacy engi- the number and percentage of re- at our company. Re nements in the neer (and former release engineer); leases that are abandoned late in the release system are direct results of Chuck Rossi, a release-engineering game. The rst two measures allow observing patterns and quantifying 42 IEEE SOFTWARE | PUBLISHED BY THE IEEE COMPUTER SOCIETY 0740-7459/15/$31.00 © 2015 IEEE s2gei.indd 42 2/4/15 6:33 PM different processes, tools, and ap- amount of tagging. We compare the that the front end is rendering so proaches using this dataset. growth and interaction metrics from slowly because I’ve lost half the To gain another perspective, the canary to those from production. back-end machines that are provid- we have systems that interact with A release engineer and the develop- ing data for this service. I wouldn’t our users, either by providing them ers look at them with more detailed have found that internally, but I will a way to give direct feedback or dashboards. For example, the ad nd it in canary because it is mil- by going through logs and look- teams have dashboards on ad dis- lions of people. ing for different types of failures. plays and ad click-throughs. Mobile deployments are more This data is distilled and presented Our alerting system works on challenging than Web deployments to product teams as a collection of either absolute numbers or the per- because we don’t own the ecosystem, signals that speak of product ro- centage rate. The biggest alert for so we can’t do all the things that we bustness and of complaints that us- ers mention most often. For Web services and servers, For Web services and servers, “canarying” is another key compo- nent of successful releases. Canary “canarying” is a key component of rollout strategies depend on the type successful releases. of service, user expectations, and contractual obligations. In this type of rollout, we gradually increase the exposure of the new binary and the Web is the log data for each new would normally do. And the canar- at all times monitor the critical pa- build. In a canary, we collect that log ies are huge. We watch cold start, rameters. Canaries are the bread and data separately from the regular pro- warm start, the app size, and the butter of the nal stages of a well- duction traf c. A website has thou- numbers of photos uploaded, com- designed release process. sands of ring errors and warnings, ments, and ads being displayed or and we look for changes in those. An clicked. Growth and engagement Rossi: I’ll talk about Web deploy- analysis of errors in the log data that numbers and the crash rate are im- ment rst and then contrast it with differ from production is the rst portant to the company. If the crash mobile deployment. For Web de- part of the canary. That’s easy, and rate uctuates, we immediately take ployment, we use the metrics of the it’s universal—it doesn’t matter what action to understand why. code going into the master branch, your app is doing. Concerning rollback, we’ve never the test results, and performance lab Another big alert is when the ca- had a canary that bad. Generally, results. The next level includes met- nary TTI is much higher than the it’s always rolling forward. We’ll rics for products being released, such production TTI. Is it because we just promote the release candidate to as core tests, unit tests, and perfor- increased login calls by 10 times? Or the production binary in our store, mance experiments like time to in- because a database call to render the roll it to 5 percent of users, and get teraction (TTI), fatal-error rates, the rst page is not going through cache data back from that. If that 5 per- number of errors per page, and any and it’s trying to do a lookup ev- cent looks good, we’ll roll it out to new errors that we hadn’t seen in the ery time? TTI helps us esh out the the rest of the population. I always production logs. problem. We pay particular atten- make the analogy that it’s like a bul- Then comes the canary step. The tion to how long it takes the main let from a gun. It just keeps going. set of binaries for a release sit in the page to render. The mobile ecosystem is so broken canary state for 30 minutes to an I have graphs for the back-end when it comes to software man- hour. I look at the logging and ag machines, but I look for effects on agement that I don’t want to force new errors, error rate changes, and the front end. When I see those ef- people to re-download. Every time I fatal or elevated error rates for an fects, I’ll start digging down. I might have to ask them to re-download, I existing error. Core metrics include see, for example, that only Internet lose a certain percentage of people TTI; the number of likes, photos Explorer 7 on Windows boxes is who just never do it. So that’s the uploaded, and comments; and the showing a bad TTI. Or I’ll realize challenge. MARCH/APRIL 2015 | IEEE SOFTWARE 43 s2gei.indd 43 2/4/15 6:33 PM FOCUS: GUEST EDITORS’ INTRODUCTION Moir: At Mozilla, release engineers Concerning rollback, we don’t Continuous deployment obvi- don’t monitor the quality of the re- really roll releases back. If there ously shines in the Web area, where lease; we have a team called Release were a serious problem, like a huge you own the ecosystem. You can Management to perform that func- number of crashes on a certain re- publish effortlessly to your Web tion. We use a “train model” for lease, we would block it so that no fleet, and your users get the fixes managing releases. When develop- updates would occur and then do a and features instantly without notic- ers have a new feature, they’ll land point release. For example, if there ing it. In minutes or hours, you will it on a certain branch and make were a security issue causing prob- know whether something is wrong sure the test suites run green. If so, lems, we would do a point release with the release. the change set will be uplifted to so that users wouldn’t get the last As I understand continuous deliv- another branch to ensure that the release and would be automati- ery, it will not happen on mobile in patch integrates with other changes cally updated to the newer release the near future. The current app dis- on that branch and tests run green. with the security fix. We call this a tribution system is based on an an- Eventually, the new feature reaches “zero-day fix.” cient model that’s not even as good the Aurora branch, which is an al- as shrink-wrapped software. In this pha branch, where it will sit for six Amid all the hype and buzz about model, you build an artifact, you weeks to bake; then it goes to the continuous delivery, what’s cur- put it out to a third party that has beta branch.