From Chaos, through Fear, to Confidence.

It’s important for a business of any size to have confidence blamed for not communicating the new available development resource in order to environmental requirement and placed on investigate and close the issues. Half way in the software it produces; from a start-up focused on performance review. A mandate is made through the month the next deployment rapidly delivering new features, to a large enterprise that all developers must communicate such is postponed by two weeks because none requirements into a run-book used by the of the features in development are near that prioritises performance and stability. At each end of individual doing the deployment. completion and another two weeks later the team still believes there are two more the spectrum future growth (or defending your current The Business also decides that to mitigate weeks of work to be done to get to feature the risk to income due to an outage caused position) depends on releasing working software at a complete. cadence which meets the demands of the marketplace. by a deployment, all future deployments must be made at 2am on a Sunday morning At this point there hasn’t been a new and a new Ops team is formed from the deployment in two months, questions are Many organisations don’t have rigorous Application Lifecycle existing group of developers who are now being asked higher up in The Business about Management processes in place – they have evolved to their responsible for the nightshift at the weekend the development team’s capability and status quo based on heroic effort, a modicum of arrogance to do the deployments. The morale of the voices of concern are being raised about and a touch of luck. The problem with this approach is development team has taken a nose dive how “fit for purpose” the existing platform that it works…until it doesn’t; everything will be just fine and individuals are worried about being is. Something has to change, but no-one is until things start going inexplicably wrong. Suddenly, bugs identified and punished for causing issues in quite sure what is needed. in the production environment can’t be replicated in the the production environment in the future. The above is a fictional, generalised account development environment; or deployments that used to The new Ops team are worried that they will of several real world projects we have been take just a few minutes of manually copying binaries now be blamed for failures of omission by the engaged to help rectify. It is a common seem to take hours because new features require a series of development team. environment tweaks to run properly. situation that many, many companies, both A month or so later, the next planned large and small, sleepwalk into. Then, one day, code that ran in the development release is postponed because the scheduled What seem like common sense decisions, environment, once deployed, doesn’t run in the production work has not been completed. Two weeks such as creating Development and environment, and the site goes down. This deployment later, after much pressure is applied from Operational teams end up creating walled can’t be rolled back because the developer responsible The Business, the development team gardens where information flow is stunted for the deployment forgot to back up the original binaries. grudgingly agree that they are feature and a culture of blame and blame mitigation After some frantic debugging, with a manager pacing complete and ready to do a deployment at is fostered, under a general cloud of fear to-and-fro in the background, it is discovered that a the weekend. new feature requires an environmental tweak that hasn’t and opaque processes. This is precisely the been documented. The deployment goes without a hitch – type of situation the DevOps movement is new environmental tweaks required are trying to address and rectify. Once the site is back up and a calculation has been made documented in the run-book; but over of the amount of money lost by the outage, an all hands the next couple of weeks the number of development meeting is called to figure out what went calls to customer support have increased. wrong and how to stop it from happening again. The The large number of support requests outcome of the meeting is that one of the developers is start to absorb a high percentage of the Within its first 4 years, RecruitmentGenius.com has elastically scale to meet peak demands, is implementation of branching unwieldy, so the story of the 18 month journey endjin had decided to carry out all development filled over 60,000 jobs for 9,000 companies. To deliver embarked on to help Recruitment Genius on the main code branch. The Team this volume of growth they had to aggressively develop move from chaos to confidence; Foundation Server was also hosted on a fragile virtual server which had developed a Before a line of code was changed, endjin their online platform. Like any good start-up they rapidly habit of crashing, which left the developers ran a workshop to perform an end to end, needed to adapt their software architecture and product in a disconnected state, unable to commit drains up review of the existing platform their code and collaborate; they had with Recruitment Genius’ development feature-set to take advantage of commercial opportunities developed a sceptical and untrusting team. Several interesting issues were as they arose. This strategy proved to be very successful relationship with their tools. revealed. Among them was a realisation and helped them grow the business to become the UK’s that architectural drift has occurred; there For ease of development, every project No. 1 online recruitment agency in 2013. was a disparity between what the team was included in a single solution file, which believed were the processes implemented stressed even the most high specification Planning the next phase in Recruitment by the system, versus what the code was development rig. A clean build took around Genius’ life, from start-up to SME, The actually doing. 12 minutes to run and the solution was Board of Directors wanted to invest in a lacking test coverage. Manual tests were Next endjin embedded with the team new platform, designed to harness all the carried out against the development to observe how “work” flowed through competitive advantages cloud computing environment. their development process, from has to offer, to enable them to offer new initial requirement inception, through Because there were no defined acceptance products and services to their customers in development and into production. This criteria, development carried on until it was a timely fashion, while keeping a lean and revealed that they were working in a highly declared “good enough” and then it was agile mentality. They had noticed that not chaotic fashion; requirements were often released into production. only did it seem that the last few feature partially formed ideas that didn’t clarify releases had been harder to deliver, as Deployments were carried out by the end goal state or acceptance criteria. if their process had developed a level of performing a build on a local development friction that was affecting their ability to The specifications of feature requests were machine. The build artefacts were then ship new features, but there had also be often framed in terms of the underlying transferred over a remote desktop an increased occurrence of issues in the database constraints and how this affected connection to the production server. production environment that had raised the code in layers above, rather than Deployments were undertaken with a concerns about the way they managed the considering the desired end user experience certain amount of trepidation as there quality of the output from development. and working down through the layers to had been a history of deployment issues. achieve it. Following a recommendation from one Having reviewed the production process, of endjin’s other customers, Recruitment Because the requirements weren’t fleshed endjin spent some time with the Genius approached endjin to assist them out it was difficult to estimate the work management team and sought information with their platform migration to Windows involved in order to deliver the feature on two fronts: how new work is conceived Azure and to help improve their end-to-end request. The development team always and prioritised; and how development software development processes. delivered, but they were expending heroic progress is reported to The Board. levels of effort to do so. More fascinating than the technical details After some time to digest and reflect of the migration and the use of cloud The team was using Team Foundation upon all the information gathered, endjin technologies to enable a business to Server , but found its proposed two concurrent work-streams. The first was a re-engineering effort to migrate the platform to Windows Azure. The second was an Application Lifecycle Management programme to overhaul the delivery practices to focus on delivering business value through code quality, along with a new RAG format for board reporting of development activity. The journey can be broken down into the Developers following steps: gitflow Step 1: Restructure the source Work Item code tree Management The first step was to create some order and reasoning behind the layout of the Ideas The Board source code tree, so that members of the development team know where code, build artefacts and binary dependencies should be stored. This should prevent the source Orchestrate Update State Create Issue tree from becoming more chaotic over time.

Step 2: Split up the solution TeamCity YouTrack

Over 4 years, dozens of projects had been Selenium Tests Deploy Create Issue created inside a single solution; the sheer volume of code put a huge amount of Monitoring strain on the IDE and generally slowed the developers down. One of the tacit aims was to make adding any new feature, or making a change, as fast as when the organisation was a young start up.

The solution was split up into five smaller logically separate solutions, this significantly reduced the time to open & compile the Feedback to Customer Support solutions. Once in the smaller solutions, aggressive refactoring was carried out (with the aid of ReSharper) to enforce standard 18 months into the journey, the new Application Lifecycle Management architectural principals (layering, single process has been encapsulated by the following tools and platforms that responsibility, separation of concerns). User have be configured to allow the frictionless flow of work through the Redundant code was extracted from the system. This quickly provides feedback about the various quality gates and solution and archived. enables rapid deployment to any environment. Step 3: Migrate solution to GitHub One of the other major behavioural attitudes How do you: of work, and no way to capture estimates or it helps to eliminate is the “it works on my track progress. • Keep your repository tidy? Once the solution has been refactored, machine” mentality – because a TeamCity the next step was to pick a reliable version YouTrack was selected because of its low builds starts in a pristine state it will • Have consistency between projects? friction approach to creating and tracking control system that the development team expose any missing files, or old versions of work items, and also for its integration with could depend upon. GitHub was chosen, binary dependencies that contaminate a • Get your developers up to speed TeamCity. When a developer committed not only for its great branching support developer’s machine. quickly with our procedures? and useful features such as the ability to a code change, they could enter the ID of code review and comment on commits, TeamCity was installed and configured to • Enable the team develop a feature the item they were working on; this allowed and a wiki for storing information about the automatically build each of the solutions, in isolation from the main branch, anyone looking at TeamCity to click a link system, but also because it was an externally every time a change was made. Additional but also allow them to collaborate? and go directly to the item in YouTrack. hosted solution, which meant there was one build steps for finding code duplicatesand • Stabilize before a production release? Endjin ran a backlog creation session less internal system to support. running code coverage reports on unit test where the whole team collaborated on run and build features, such as AssemblyInfo • Manage production releases? gathering existing feature ideas and entering Step 4: Introduce MSBuild based Patcher and Build Files Cleaner were them into YouTrack as a single backlog. automated build framework configured, which help improve the overall • Manage production hotfixes? The Product Owner transcribed product quality of the solution. We replaced the error prone, manual build The solution GitFlow proposes is a & feature ideas that were stored in multiple steps with BuildFx, endjin’s MSBuild-based With TeamCity 8.1 pre-tested commit prescriptive branching model, born of different locations (note books, draft automated build framework process. functionality has been added for via experience, which answers the questions emails, spreadsheets, and documents) Executing any unit or integration tests the new Automatic Merge build feature. above, and a set of Git Extensions that in to YouTrack so that there was a single as part of the build also made a huge This is more commonly known as a gated allow you to manage and automate work product backlog that could be easily difference to the developers’ confidence check-in and means that changes are built through that model. groomed and prioritised. levels. They know that by running a single on a private branch, and those changes are TeamCity enables GitFlow support via its command file, within a few minutes they merged into the main branch only if the Step 8: Introduce User Stories and branch specification feature; TeamCity can do a full clean build and test run, and build finishes successfully. This ensures that Behaviour Driven Development will automatically monitor each of the all the build artefacts are packaged into a other members of the team don’t have to dynamically created GitFlow branches Once there was a unified backlog of work, single location, ready for deployment. deal with broken builds as the main branch and build any changes committed to that the next step was to convert them from should always be in an unbroken state. Step 5: Install TeamCity branch. generalised feature requests to user stories that explain the actors involved, the context Step 6: Introduce GitFlow For a more detailed description of Gitflow The next logical step from having an and reasoning behind the feature and a set and configuring your TeamCity environment automated build process that developers Now the foundations required for providing of acceptance criteria. see the following blog posts: http://blogs. can run locally, is to ensure that this build a feedback loop to developers about the endjin.com/tag/gitflow/ This change helped on a number of fronts. process is executed every time a developer quality of the code they are committing It helped the developers understand the commits a code change. If a code change has been established, the next step was Step 7: Install YouTrack reasoning behind some of the feature causes a build failure, everyone should be to reduce the friction and complexity of requests; it started to instil the notion of notified and a fix should be applied. The collaboration on a single codebase. Once the main engineering practices had the personas using the platform, and make most common problems it helps to detect GitFlow sets out to answer the following been established, it was time to focus developers think about implementing are when developers fail to commit all files questions that plague every software on the overall development process & features from a user experience in a change set, or when a seemingly small development team: methodology for creating, elaborating and perspective: from the user interface down change affects the behaviour in another part delivering work. The existing process relied to the database, rather than the traditional of the system and causes a unit test to fail. heavily on email. There was no overall list approach which limits the experience based on the constraints made by underlying Step 10: Introduce automated team decided they wanted to create an semantic logging into the platform, which technology choices. deployments via TeamCity extra quality gate: a series of end-to- allowed them to reconstruct the context of end UI automation tests that would run any issue in the test environment in order Once the team understood how to articulate Work was starting to flow rapidly through on demand. By this point the team had to diagnose, debug and fix the issues. An requirements in terms of user stories, the the system, so the next bottleneck to adopted SpecFlow for all integration based additional build step was added to the next skill to adopt was the translation of tackle was the ability to rapidly deploy to specifications and endjin build a custom deployment process to notify New Relic the user stories into a series of executable any given environment. Endjin integrated SpecFlow plugin which allowed the that a new deployment had occurred – this specifications that the developers could their DeployFX for Windows Azure developers to tag different scenarios with enabled the development team to observe create to help them elaborate what framework, which is responsible for both the a list of browsers and platforms that wanted longitudinal data to see if any specific behaviours were required to deliver a user configuration management and deployment to run the tests against. The SpecFlow deployment caused performance issues. story. These executable specifications were process. It was originally integrated as a plugin is available at: https://github.com/ written in a mixture of MSpec and SpecFlow, Errors raised by New Relic were turned into build step, but since TeamCity 8.x it has endjin/Endjin.Selenium.SpecFlowPlugin both of which were run by TeamCity as been packaged up as a Meta Runner. backlog items and entered into YouTrack; part of the process. Rather than manage their own browser these could be prioritised at the start of Code coverage reports were used to identify To enable automated deployments, the test environment, the development team each iteration depending on their level of areas of code that were not exercised by the development team made changes to the decided to use Sauce Labs a cloud based urgency. Problems that customers reported, existing specifications, indicating that some platforms’ configuration files, so they could Selenium test infrastructure SaaS product. were also captured by the customer support scenarios and edge cases were missing and be automatically updated by the deployment Sauce Labs will automatically provision staff and were also captured and prioritised would need to be dealt with. process depending on which environment your browser / OS combination, run your in the same manner. had been selected as the deployment test suite against your web application These two feedback loops ensured that Step 9: Introduce Iterative & target. The TeamCity continuous integration and report the results. Videos of each test problems were captured, prioritised, fixed Incremental development process was also modified to publish run are recorded to allow you to diagnose and deployed rapidly, ensuring the platform deployable build artefacts that DeployFx for any test failures. TeamCity was used to Now that they had a backlog of well continually improved, became faster and Windows Azure could consume and deploy. orchestrate the running of these tests; understood work, and a process for turning more stable as time goes on, rather than the results are surfaced as standard unit those ideas into provable code, the next Deployments were now so straight forward the more common occurrence of long test results showing success and failure. thing to tackle was to evolve the mind-set that any member of the team (with the term performance degradation. from developing a big stack of features and correct security permissions) could Sauce Labs have released a TeamCity releasing them in one go every month or instigate a deployment; developer, tester plugin that allows orchestration of so, to building iterative and incremental or product owner. Because deployments Selenium Tests and will also gather test changes and releasing them as quickly as now took minutes, dozens were performed result videos and screenshots and enables they pass through defined quality gates. This a day, which rapidly reduced the fear TeamCity to publish them as build artefacts: is both a behavioural and mind set change. that had been associated with manual https://github.com/rossrowe/sauce- The process was instigated by using the deployments and increased confidence teamcity-plugin Agile Board features of YouTrack to plan that each production deployment would what work was going to be delivered on work perfectly. Step 12: Install NewRelic for a weekly basis. It allows easy visualisation application monitoring of what the team is working on, how Step 11: Introduce Web UI The final feedback loop created was close work themes are to being complete Automation with Selenium extensive application monitoring of the and whether they are in a position to and Sauce Labs platform. New Relic was chosen because of be deployed (changes moving from a Now that changes could automatically its tight integration with the Windows Azure feature branch to the main release branch be deployed into any environment, the Platform. The development team introduced in GitFlow). Who are Recruitment Genius?

Recruitment Genius has rapidly grown to become the UK’s largest online flat-fee recruitment company. Their business model has helped fill over 59,704 jobs for over 8,800 UK companies for an astonishing flat-fee of only £199. Many well-known brands such as Tesco, Shell, Sony Playstation and Ford use Recruitment Genius, as well as many other small but growing companies, It was frustrating before as ideas would get lost in an ether of to place everyone from board level to “communication; actioned in no particular order; lacked robust shop floor. recruitmentgenius.com testing, and because it took so long to deploy we were forced Who are endjin?

to do waterfall releases. Established in 2010, endjin has become But since Endjin automated everything, all our ideas are the number one Microsoft Windows Azure Partner in EMEA, delivering projects using logged and prioritised using YouTrack and we avoid a mess cloud, mobile, desktop and embedded of email communication as the team collaborate together technologies in retail, financial services, in one central place. and consumer applications. endjin.com GitFlow ensures our developers don’t tread on each other’s Technologies used toes and we are in control of which features make it into any given release. TeamCity takes care of automated testing and deployment to Azure in minutes rather than days. Automating these tasks ensure we rapidly deliver product enhancements to our customers. The outcome is our developers are happier, our customers are happier and that makes my life as a product owner much easier.

Geoff Newman“ Managing Director, Recruitment Genius

Strategy Experience Development Cloud

2 Leathermarket Street, London, SE1 3HN t 020 8720 7287 e [email protected] endjin.com