The Team Working on the Migration Had Several Requirements That the Migration Process Had

Total Page:16

File Type:pdf, Size:1020Kb

The Team Working on the Migration Had Several Requirements That the Migration Process Had

Windows Azure Customer Solution Case Study

Microsoft Moves Websites to Windows Azure, Maximizes Resource Utilization, Reduces Costs Service

Overview “Since migrating Microsoft TechNet to Windows Azure, we Country or Region: United States have achieved more scalability, maximized resources, and Industry: Software engineering maintained high performance.” Customer Profile Based in Redmond, Washington, Purush Vankireddy, Director, Service Engineering, Microsoft Microsoft is a worldwide leader in software, services, and Internet technologies. It employs roughly 90,000 people and operates 112 country Microsoft wanted to migrate its Microsoft TechNet and subsidiaries. Microsoft Developer Network websites to a cloud environment Business Situation without necessitating any code or architectural changes. It The Enabling Platform Experience group at Microsoft wanted to start migrating its wanted the solution to provide performance that was equivalent larger websites to take advantage of the to or better than its on-premises solution and that would allow benefits of a cloud services environment, such as reliability and scalability. ease of operations. The decision, implemented by the Enabling Platform Experience (EPX) group, was to move the websites to Solution Microsoft decided to move two of its the Windows Azure cloud services environment. In 2011, the EPX largest websites, Microsoft TechNet and group began the migration of Microsoft TechNet. Microsoft now Microsoft Developer Network, from an entirely on-premises infrastructure to the enjoys a solution for the website that scales dynamically to meet Windows Azure cloud services demand, provides fast performance, and minimizes on-premises environment. infrastructure and costs. By using Windows Azure, the EPX group Benefits will be able to reduce its forecasted server acquisitions by 20 Maximized resources Gained greater scalability percent. Maintained high performance Reduced infrastructure and maintenance costs Situation The EPX group at Microsoft is responsible In June 2011, the Enabling Platform for managing a number of Microsoft Experience (EPX) group, part of the Developer online and offline experiences, Microsoft Developer Division, began a including the Microsoft Developer Network journey that would see two of its largest (MSDN) and Microsoft TechNet. Usage of developer and IT professional websites both of these sites is highly variable, with migrated from an entirely on-premises significant spikes when new products infrastructure to take advantage of the launch or during training or conferences. reliability, scalability, and availability of the “To support this kind of variability along cloud environment. with data center redundancy, we had to provision three times the capacity required The team working on the migration had to support peak volumes, resulting in an several requirements that the migration aggregate server utilization of 20 percent process had to meet: over the course of a year,” says Jay Jensen, Sr. Systems Engineer, EPX group at No code or architecture changes. The Microsoft. migration should be accomplished with minimal changes to configuration and Solution with no changes at all to the architecture Microsoft decided that the first phase of or code of the applications. By the migration would be to move Microsoft eliminating the need to re-architect the TechNet to Windows Azure, the Microsoft application to run in the cloud, this cloud services development, hosting, and approach would enable a much faster management environment. Windows Azure and less expensive migration. provides on-demand compute, storage, networking, and content delivery Equivalent or better performance. capabilities through Microsoft data centers Performance of the migrated application around the world. The MSDN and TechNet must be equivalent to or better than the websites have a similar architecture and are on-premises solution. In particular, it hosted on the same hardware. Because must be able to scale dynamically to TechNet receives less traffic overall than meet demand, while minimizing running MSDN, the decision was made to perform costs. the migration for TechNet first, and then apply the lessons learned to the MSDN Ease of operation. The migrated website. The midterm target for the EPX application must be easy to operate, group is to move all applicable websites monitor, manage, and maintain. and applications to Windows Azure by the end of 2014. Reduced on-premises requirements. The result must provide opportunities to Architecture of TechNet Before minimize on-premises infrastructure Migration requirements and costs. TechNet is designed for high web traffic. It experiences a high number of reads with no caching on the front-end web servers.

29 TechNet has an average traffic level of management functions, and the content more than 30 million unique users a month, publishing system remain on-premises. The in addition to other requests such as result is a future-state architecture to which indexing by search engines. the team can migrate applications over time through gradual re-architecting. TechNet has a typical two-tier architecture, with web servers already in a virtualized To move the TechNet infrastructure to the environment. Visitors to TechNet access cloud environment, the team made design web front ends that, in turn, read content decisions at each layer of the infrastructure. from a farm of database servers. This For traffic routing, the team utilized database layer is primarily hosted on high- TechNet’s current global load balancing end, four-terabyte servers with the content capabilities using Akamai networking replicated in four different data centers. services to direct traffic to Windows Azure. (Figure 1 illustrates the original on- This enabled the team to pilot the premises architecture of TechNet.) approach with live traffic and divert incremental amounts of traffic from on- Content for the site is pushed to the premises data centers. databases from a content publishing system. A complicating factor in the For the web front ends, the team used the migration was that TechNet was in the Windows Azure Virtual Machine (VM) role, midst of a transition from one content which enabled them to use an existing VM management system to another, which image to seed the cloud migration. This meant that two code bases were sourcing reduced the probability that engineering the content for end users. changes would be required and provided the team with more control over the Migration Process migration and configuration. Because of Microsoft engaged Accenture, a member of the two code bases, the team packaged the Microsoft Partner Network, for the each one into its own VM role and used a initial evaluation and feasibility assessment custom content-switching solution to drive utilizing its Azure Migration Accelerator traffic to the appropriate role. assets and the Premier Field Engineering (PFE) teams, who have vast experience in To achieve elasticity and the consequent debugging and troubleshooting minimization of runtime costs, the team performance issues. Then, in June 2011, a chose the Enterprise Library Autoscaling team of just three service engineers Application Block from the Microsoft embarked on the migration within an patterns & practices group. This enabled aggressive time frame to perform the the application capacity to automatically migration. mirror demand by starting and stopping role instances in response to a range of The final design for the overall migration of factors, such as server load and resource all sites is a hybrid application where a usage. Incorporating the Autoscaling portion of the application runs on Windows Application Block and configuring the Azure, while the data layer, monitoring, and autoscaling rules took the team just a few

39 hours. "The Autoscaling Application Block data and GFS, which allows data to reside allows MSDN and TechNet to automatically in and pass through the cloud environment. handle changes in the load levels over time. It helps minimize operational costs, while GSAA also provides a new domain named still providing excellent performance and azr.gbl (the Windows Azure domain) that availability to our users," says Dr. Grigori has trust relationships with the Microsoft Melnik, Sr. Program Manager, Microsoft internal on-premises domain that hosts patterns & practices group. many online services. This enables the use of integrated authentication. With all of Databases were another key consideration these controls in place, the GFS network during the migration. Because the TechNet team allows connectivity among Windows content database is almost four terabytes, Azure hosted services and between internal SQL Azure was not a viable option in the systems and hosted services. The GSAA short term. Instead, the team created a solution enables Windows Azure hosted hybrid cloud solution in which the web services to interact seamlessly with front end resides on Windows Azure and Microsoft internal hosted services over the data tier remains on-premises. GFS’s world-class network, without making intrusive security or network changes. For content switching, the on-premises platform used a hardware-based solution. Performance Comparisons For the migrated solution, the team created A major goal for the move to Windows an Application Request Routing (ARR) role Azure was to maintain or improve to achieve the same results. (Figure 2 performance for the migrated applications. illustrates the hybrid application The EPX Performance and Reliability team architecture at the time this case study was compared the performance of the original written.) on-premises sites and the TechNet sites hosted by Windows Azure. To achieve network connectivity and authentication/authorization between the The charts in Figure 3 illustrate the page on-premises databases and cloud-hosted load time for the initial user experience web front ends, the team chose to adopt a (PLT1) for pages served from four regional solution provided by Microsoft Global data centers. Page load times are measured Foundation Services (GFS). GFS is the above fold time (AFT). Fold time is the engine that powers the infrastructure and point at which the browser clears the many services for Microsoft global data current page and starts to load the new centers. The GFS Services for Azure page. Measuring performance from this Applications (GSAA) team provides a point ensures that results are not skewed solution that uses a Windows Azure plug-in by factors such as DNS resolution time and or a base VM role image to manage traffic proxy server negotiation. heuristics. It provides a framework that meets all of the medium business impact For both the first-time user experience and (MBI) requirements for Windows Azure the returning user experience, the difference in page load times was less than

49 200 milliseconds for the majority of pages, Lessons Learned which is within the margin of acceptable The experience gained from the initial performance. The difference is mainly due TechNet migration will be used as a to latency of the Content Delivery Network template for the migration of other EPX (CDN) and advertisement delivery. sites. The initial migration of the TechNet website to Windows Azure provided the A few pages exhibited differences of up to EPX group and other teams at Microsoft 400 milliseconds in some regional data with many useful pointers for the future. centers; the team is investigating individual For example, the assessment showed that performance improvements for these cases. numerous operational processes would However, extensive performance and need to change as EPX transformed to reliability testing has shown that the overall support cloud-based solutions. These performance after migration to Windows processes include: Azure is equivalent to the on-premises applications, and in some cases better for Logging and monitoring. Windows Azure certain pages. Diagnostics transfers Windows logs and other trace information to Blob Storage Proposed Near-Term Implementation as scheduled jobs. Data must be The EPX team is now pursuing initiatives downloaded from Blob Storage, and the aimed at providing a future architecture for team chose Microsoft System Center TechNet and other websites and Operations Manager and Virtual IP (VIP) applications hosted on Windows Azure. The monitoring tool (an internal HTTP proposed architectural changes include: monitoring solution) for this task. Local Instance health checks are also Migration of the web front ends from VM performed, while third-party providers roles to web roles in order to reduce monitor application pages (Keynote) and support requirements, remove the need perform network traffic management. to manage the operating system, and to simplify deployment. Business continuity and disaster recovery (BCDR). Existing traffic Migration of the databases to Windows management capabilities plus local Azure using the new Infrastructure as a instance health checks of pages enable a Service (IaaS) capabilities. failover to or from Windows Azure at the cluster level. The health check pages Use of an on-premises virtual private cloud incorporate functionality to test for implemented with the Windows Server 8 issues such as loss of data layer operating system to allow content to be connectivity. published from on-premises servers to Windows Azure. Backup and restore. Existing systems manage backup and restore for on- (Figure 4 illustrates the proposed future premises data. Specific backup and architecture.) restore facilities are not required in the cloud-hosted portion of the application

59 because Windows Azure automatically its forums and user groups. replicates data, such as the log information persisted in Blob Storage. Use available tools to evaluate code against known Platform as a Service (PaaS) Operating system updates. A service migration challenges. engineer connects to a “golden master” VM and applies operating system and Understand the application and its security updates using msnpatch.exe, potential risk areas. These may include and then uses an automated deployment server-specific configurations, special process to publish the VMs in Windows networking requirements, such as Azure. content switching or affinity, support for multiple sites, and connectivity to Deployment. Operating system and supporting systems or business layers. Internet Information Services (IIS) updates are applied to a differencing Gain operational flexibility by allowing disk. This is deployed to Windows Azure configuration and content to be staging and a VIP Swap occurs to move it modified independently from the into live production. When minor package or VM that is deployed. changes are required, additional scripts can be used to push content Take full advantage of Windows Azure deployments independently to each services, such as Windows Azure Service running Windows Azure role instance. Bus and SQL Azure Data Sync, and tools or frameworks, such as the Microsoft Guidelines for the Future patterns & practices Enterprise Library The ongoing migration has also revealed Extensions for Windows Azure. some useful general guidance for future migrations, which will benefit all designers With any migration project, issues may and developers considering migration of occur that will only be discovered when their applications and websites to Windows something doesn’t work quite as Azure. anticipated. Some issues that the team came across were: No need to reinvent the wheel: explore and apply known good practices. Note the time used. Windows Azure is always in Coordinated Universal Time Consider application and data security. (UTC), while on-premises services are Remember that Windows Azure is a likely to use local time. public space. Consider whether it is necessary to change Understand the capabilities and limitations the page size for web and worker roles of Windows Azure, outsourcing based on the size of the role instance migration if required. Use the resources and application. available on the Windows Azure portal,

69 Always use the latest software development The requirement to meet highly variable kit (SDK) version when developing traffic patterns meant that the overall applications and consult the Known server utilization used to fall to 20 percent.. Issues pages when upgrading the SDK However, the over-provisioning was version. necessary to meet demand during busy periods. With the cloud-hosted solution, Currently, the EPX development team has the EPX group now has elasticity through migrated 40 percent of TechNet and MSDN the easy addition and removal of servers to traffic to Windows Azure, utilizing the meet demand. design configuration described above. “By approaching the migration as an “With the migration of TechNet and MSDN infrastructure migration, with no core to Windows Azure, we are able to bring application code or architecture changes, down server acquisitions by 20 percent,” we reduced the effort and testing required says Saravanan Vinayagam, Service and completed the initial migration within Engineering Manager, EPX group at three months of completing the feasibility Microsoft. assessment,” says Jay Arvin, Systems Engineer, EPX group at Microsoft. Since Maintained High Performance then, TechNet has been running for more The benefits have been achieved without than 60 days on Windows Azure and three adverse effects on performance and service more sites are lined up for migration. The availability to website visitors. Early reviews migration of TechNet traffic will enable the of performance at the client for the solution EPX group to reduce its forecasted server hosted on Windows Azure, compared to acquisitions by 20 percent. the original on-premises deployment, show that the two are statistically equivalent for Benefits all pages when using local resources. In By proceeding with the migration of two of other words, the Windows Azure solution its largest websites to Windows Azure, may be slightly faster or only slightly slower Microsoft has achieved a number of than the on-premises solution, depending significant benefits. In terms of reliability, upon the normal variations in Internet scalability, availability, and minimizing traffic. costs, the migration proves that Windows Azure works—and works exceedingly well. “We are very pleased with the comparison numbers between the performance of the “Since migrating Microsoft TechNet to on-premises solution and the hosted by Windows Azure, we have achieved more hybrid Windows Azure solution,” says scalability, maximized resources, and Roopa Venkatasubramanya, Performance maintained high performance,” says Purush Lead at Microsoft. Vankireddy, Service Engineering Director, EPX group at Microsoft. Reduced Infrastructure and Maintenance Costs Maximized Resources and Improved Every organization is looking for ways to Scalability reduce energy usage and cost and to

79 minimize investment in infrastructure. The Brownell, General Manager, EPX group at use of hosted virtual servers can minimize Microsoft. “We recommend using all the initial and ongoing hardware, infrastructure, technical resources that you have available, and maintenance costs and achieve including using tools to evaluate code and significant savings in day-to-day running taking advantage of Windows Azure costs. services,” she says.

Ultimately, through the full migration of all on-premises MSDN and TechNet web front ends to Windows Azure, estimates suggest that Microsoft could save between 18 percent and 25 percent on hosting costs. The benefits become even more compelling when considering the reduction in cost and management associated with spikes in capacity. Dynamic scaling capabilities and simple configuration changes can change the number of role instances deployed in Windows Azure.

Reduction in costs has another important aspect. A significant environmental (and Windows Azure often regulatory) focus for all companies Windows Azure provides developers the today is to minimize their carbon footprint. functionality to build applications that span Microsoft has found that hosted solutions from consumer to enterprise scenarios. The that support dynamic resource scaling help key components of the Windows Azure are: achieve a significant reduction in energy usage and help it to meet emissions Windows Azure. Windows Azure is the targets. development, service hosting, and service management environment for the Utilized Migration for Learning Windows Azure. It provides developers Microsoft will move many of its websites with on-demand compute, storage, and applications to Windows Azure over bandwidth, content delivery, middleware, time. The knowledge and experience and marketplace capabilities to build, gained during the initial phase of migrating host, and scale web applications through Microsoft TechNet will be invaluable for the Microsoft data centers. migration of other applications. SQL Azure. SQL Azure is a self-managed, multitenant relational cloud database “Initial migration of MSDN/TechNet not service built on Microsoft SQL Server only helped us to improve the scale and technologies. It provides built-in high reliability, we learned many things that will availability, fault tolerance, and scale-out help our business leverage cloud database capabilities, as well as cloud- infrastructure in the near future,” says Lori based data synchronization and

89 reporting, to build custom enterprise and web applications and extend the reach of data assets.

To learn more, visit: www.windowsazure.com www.sqlazure.com

99

Recommended publications