Get More Capabilities for Every Storage Dollar
Total Page:16
File Type:pdf, Size:1020Kb
Customer Solution Case Study
Microsoft Uses Operating System to Triple Storage
Overview “With Storage Spaces, we have the functionality of a SAN Country or Region: United States without the cost of a SAN. We’re able to use commodity Industry: High tech & electronics hardware to achieve the same functionality at a far lower Customer Profile Founded in 1975, Microsoft is a cost.” worldwide leader in software, services, Jeremy Russell, Senior Development Lead, US Windows Engineering Team, Microsoft and solutions that help people and businesses realize their full potential. The Windows Build Team at Microsoft is responsible for delivering current versions of the Windows operating system to Business Situation The Windows Build Team at Microsoft development and test teams across the company. It needs 2 wanted a more cost-effective, resilient petabytes of storage to perform up to 40,000 Windows way to assemble the 2 petabytes of storage needed to deliver new Windows installations a day and had assembled a mix of traditional data builds to quality assurance teams. center storage technologies for the job. However, costs were
Solution high and resilience and capacity in short supply. The team The team upgraded its Windows-build upgraded its Windows build environment to the Windows Server infrastructure to Windows Server 2012 to build a cost-effective, highly available, 2012 operating system and used the Storage Spaces feature to high-performance scalable storage switch to industry-standard storage and consolidate its storage solution using commodity hardware. servers. By using Windows Server 2012 features like Storage Benefits Spaces and data deduplication, the Windows Build Team gets far Get more capabilities for every storage dollar more performance and resilience for its storage budget and has Increase test-server resilience been able to triple storage capacity. All these benefits help Improve performance Speed development work Microsoft launch new versions of Windows sooner. Situation Capacity was another problem. Even The Windows Build Team, part of the though the team had more than 2 Windows engineering group at Microsoft, petabytes of storage, this was still not makes new versions of the latest Windows enough to support more than five days’ source code available to development and worth of data; internal teams wanted quality assurance (QA) teams across the longer data retention, but budget Windows engineering team. The Windows constraints and the high cost of storage Build Team compiles and builds 220 made that unfeasible. This meant that the terabytes of data, which feeds 5,000 to image with the bug might not be available, 40,000 installs a day. which caused triaging and issue fixes to take longer. The team kept 2 petabytes of storage on hand for its work, but the cost was high. Solution Roughly 1.5 petabytes was spread across The Windows Build Team decided to 120 servers with local direct-attached upgrade its Windows-build infrastructure to storage, and 700 terabytes was in older the Windows Server 2012 Datacenter storage area network (SAN) technology. operating system to take advantage of the Each of the 120 servers costs about Storage Spaces feature set. This feature set US$25,000, so maintaining an optimal enables organizations to use industry- refresh frequency was a fiscal challenge. standard storage—notably “just a bunch of The SANs were also aging and expensive to disks” (JBOD) devices—to build highly maintain; any SAN outage or support scalable, continuously available storage incident cost up to $50,000 to resolve. solutions at a substantial cost reduction.
Resilience was another concern. While the The Windows Build Team used Storage team had no single point of failure with its Spaces to manage a RAID Inc. Serial distributed storage architecture, it had Attached SCSI JBOD containing 60 disks. dozens of individual failure points. When a On these 60 disks are 75.5 terabytes of disk in one of the older servers failed, which usable data storage. The team has 14 of happened about once a month, the these Storage Spaces servers, which developers or testers using it lost days of provide 1.1 petabytes of storage, and will work. “When we lose a server, we lose the ultimately have 20 of these servers to test content or the Windows image itself, provide a total of 1.5 petabytes of storage. which can set testing back a couple of days,” says Jeremy Russell, Senior The Windows Build Team augments Development Lead, US Windows Storage Spaces with other Windows Server Engineering Team at Microsoft. “If the 2012 features to improve storage efficiency debug symbol storage failed, for example, a and availability. It uses the data developer had to spend time finding deduplication feature to store data more another build to use and install, which efficiently. Data deduplication finds and could take an hour or two, and then they removes duplicate data without could triage the break. With 5,000 compromising data integrity. This enables developers, dozens of teams could be the team to store more data in less physical similarly impacted by any single outage.” space—a big benefit for mirrored configurations, which reduce the amount of
24 available storage space. “Our deduplication new capabilities. “We are using data rates are at worst about 45 percent and at deduplication to optimize our storage and best 75 percent,” Russell says. “We can get provide greater utilization and availability,” up to 75 percent if we ensure greater data says Russell. “In one of our storage sets, I affinity by storing the same kind of data on currently have 745 terabytes of data the same servers—which is easy for us to though I only have 453 terabytes of do. So even in a worst-case scenario we physical storage. We’re able to allocate recover almost all of the physical space lost more storage than we physically have.” due to mirroring and in the best case, deduplication increases the amount of data The team can scale out its storage we can store.” infrastructure as needed without the need to buy expensive SANs and overprovision; The team uses the Windows Server 2012 it only buys hardware as needed. “With failover cluster feature to help protect data Storage Spaces, we have the functionality across its storage cluster. If one of the of a SAN without the cost of a SAN,” servers fails, another node in the cluster can Russell says. “We’re able to use commodity take over its workload without any hardware to achieve the same functionality downtime. Also, the team is able to use the at a far lower cost. We’re paying about Windows PowerShell command-line $0.45 per gigabyte of storage versus the interface and scripting language to manage $1.35 per gigabyte that we were paying its storage environment, rather than using previously and getting more for our money. specialized SAN management software. It was pretty impactful to my management And it is able to use the Operations that while I was given a certain budget to Manager component of Microsoft System replace a third of my storage, I was able to Center 2012 to automatically monitor its replace all of it.” storage system. Storage management costs are also lower, Benefits partly because the Windows Build Team By upgrading to Windows Server 2012 and has only 20 servers to manage instead of taking advantage of Storage Spaces and 120. Additionally, the team can now related technologies, the Windows Build manage its storage environment using Team gains more storage, resilience, and standard Windows tools such as Windows performance for its storage budget and can PowerShell and Operations Manager. retain more data for longer periods of time. These benefits help Microsoft launch new Increase Test-Server Resilience versions of Windows sooner. While increasing test-content resiliency wasn’t a goal at the outset, it has been a Get More Capabilities for Every Storage serendipitous benefit of moving to Storage Dollar Spaces. It was very expensive for the team When its Windows Server 2012 Storage to build resilience into its distributed Spaces infrastructure is complete, the infrastructure, but by using Storage Spaces, Windows Build Team will be able to reduce the Windows Build Team has been able to the total number of servers that manage consolidate its storage servers from 120 to storage from 120 to 20. Plus, it is able to 20, which reduces points of failure. “We buy more storage with its budget and gain have fewer single points of failure, and with
34 Windows Server 2012 failover clusters, we Windows Server 2012 are able to protect all those servers,” Windows Server drives many of the world’s Russell says. “We can put all 20 machines largest data centers, empowers small on power backup, which was not practical businesses around the world, and delivers with 120 servers, and provide a higher level value to organizations of all sizes in of service to our internal customers. We between. Building on this legacy, Windows can also deliver resiliency at the individual Server 2012 redefines the category, server level, which was not practical delivering hundreds of new features and before.” enhancements that span virtualization, networking, storage, user experience, cloud Improve Performance per Storage Dollar computing, automation, and more. Simply The team has also gained more put, Windows Server 2012 helps you performance per gigabyte/dollar of transform your IT operations to reduce storage. “With Storage Spaces, we get the costs and deliver a whole new level of same disk I/O on one machine that we got business value. on 10 servers previously,” Russell says. “The only other way to get that level of For more information, visit: performance would have been to invest in www.microsoft.com/en-us/server- a high-end SAN. We’re getting 3.6 cloud/windows-server/2012-default.aspx gigabytes of reads per second and almost 1.9 gigabytes of writes per second. Those numbers are close to the theoretical maximum of a dual-SAS connection.”
Speed Development Work, Product Launches By using Storage Spaces and data deduplication, the Windows Build Team has been able to make far more test data available to Microsoft test and development teams. The team has increased debug symbol retention by three times and thinks it can ultimately realize a five-times symbol retention improvement. “When software bits are available to testers and developers for a longer period of time, they can fix code bugs faster and ultimately get Windows releases out the door sooner,” Russell says.
44