A Primer on Nearline and Archival Storage Solutions
Total Page:16
File Type:pdf, Size:1020Kb
A Primer on Nearline and Archival Storage Solutions WHITE PAPER A Primer on Nearline and Archival Storage Solutions Nearline storage represents the position in the storage hierarchy between online and offline storage. Nearline storage is almost instantaneously accessible through the use of automated robotics-based removable media handling. No human intervention is required. Explosive Data Growth The value of nearline storage has been prompted by the phenomenal growth of electronic data in recent years. Applications such as e-mail, multimedia, databases and e-commerce are contributing to the explosive data growth at the enterprise level. Organizations are now doubling the amount of their data every 12 to 18 months. This ongoing need to store more and more data is creating additional inefficiencies, as the infrastructure for storage management is unable to keep up with the growth in data storage. Figure 1 – The rise of electronic data growth (data source IDC and SNIA) Copyright © 2002 KOM NETWORKS Inc., All Rights Reserved 1 http://www.komnetworks.com A Primer on Nearline and Archival Storage Solutions Concerns and Issues Backup Issues The explosive growth in data results in longer backup cycles for the simple reason that there is more data that needs to be backed up. This is becoming problematic given that organizations have smaller windows of opportunity to conduct backup operations. Reasons for this include: Increasing access requirements and 24 x 7 operations. If storage requirements are doubling every 12-18 months then the load on the backup system will also be doubling every 12-18 months. Backup system scalability is not an easy undertaking at the best of times. It has become even more difficult to scale given that the data load is constantly growing. Unless we can either reduce the amount of data that regularly gets backed up, or increase the allocated time for backup operations, we are heading down a dangerous path where backup jobs are not regularly run. This leads to a low probability of successful data backups and an unpredictable ability to recover data. Scenario – Cost of data recovery XYZ Tax Services generates $10,000,000 per year from preparing tax returns using their own tax software application. 1,000 tax specialists regularly use the software to generate revenue for XYZ. The average hourly wage for an XYZ employee is $15/hour. Let’s assume an average data recovery time of 4 hours should the system go down. The business operates 100 days per year and a typical outage would result in 20% of the data being lost. Recovery costs average $20,000. However this recovery cost does not truly represent the cost of the down time to the company. There are two other costs to consider: The cost of lost data and the opportunity cost associated with employees getting paid for work they are not able to perform. Annual Revenue AverageCost of Lost Data= × %of DataLost = = $20,000 OperatingCost Copyright © 2002 KOM NETWORKS Inc., All Rights Reserved 2 http://www.komnetworks.com A Primer on Nearline and Archival Storage Solutions Opportunity Cost = Employee Time Lost = Average Data Recovery Time × Average Hourly Rate × Number of Employees = $60,000 Figure 2 - Cost of Outage, % of Total Revenue and Incidents/year for XYZ Tax Service (Data Source SNIA) As outlined in above example, few outages during a year can cause significant impact on the bottom line of the companies. Business Industry Hourly Downtime Cost Brokerage Operations Finance $6,450,000 Credit Card Sales Authorization Finance $2,600,000 Home Shopping (TV) Retail $113,000 Pay-per-view Media $150,000 Catalog Sales Retail $90,000 Airline Reservation Transport $90,000 Tele-ticket Sales Media $69,000 Package Shipping Transport $28,000 ATM fees Finance $14,500 Source: Fibre Channel Industry Association Table 1 – Typical hourly downtime cost for businesses Copyright © 2002 KOM NETWORKS Inc., All Rights Reserved 3 http://www.komnetworks.com A Primer on Nearline and Archival Storage Solutions Inactive Data As the amount of data storage grows, so too does the amount of inactive data. Studies consistently show that only a small portion of data is frequently accessed. A typical file server contains 20% active data and 80% inactive data. This implies a non-optimal use of primary storage resources. One of the major reasons why organizations generate significant amounts of inactive data is because they tend to add capacity without taking into consideration the management of data and what gets stored where. Because of the tremendous time pressures IT administrators are often under, they often don’t take the time to plan for moving or migrating data that is no longer current, or infrequently accessed. Figure 3 – 80% of the data in a typical File Server is accessed infrequently Aged Data In most instances, the value and relevance of data to an organization decreases over time. “Aged” data may continue to be stored for extended time periods due to taxation, regulatory or other business- related reasons. However, maintaining inactive and aged data on online storage resources is both unproductive and costly. There is a tremendous need to implement a suitable storage strategy to handle most of an organization’s inactive and aged data. Copyright © 2002 KOM NETWORKS Inc., All Rights Reserved 4 http://www.komnetworks.com A Primer on Nearline and Archival Storage Solutions Figure 4 – Value of Data with Time Storage Cost Inactive data creates longer backup windows, requiring greater resources and increased management costs. It also results in longer recovery times. Keeping rarely accessed data online can be very expensive for several reasons. With the rapid decline in prices of online storage such as NAS devices, it may appear that online storage is very inexpensive. The hardware cost alone doesn’t show a complete picture. Total Cost of Ownership (TCO) based comparison includes hard costs such as cost of hardware, software, and maintenance and soft costs such as management cost for monitoring, scheduling, reviewing, managing off-site vaulting, installing and configuration of new hardware. An analysis based on TCO will show that online storage is much more expensive than either nearline and offline storage. Typically online storage capacity is not scalable without purchasing another online storage device. Whereas purchasing additional media at a fraction of the cost can expand the capacity of nearline and offline storage devices. As well, one also has to consider the costs due to the disruption in operations. This occurs all to often when organizations run out of space on their online storage devices because of the explosive growth of their data. Copyright © 2002 KOM NETWORKS Inc., All Rights Reserved 5 http://www.komnetworks.com A Primer on Nearline and Archival Storage Solutions DAS DAS Virtualized Distributed Centralized Centralized Figure 5 – Storage Management with various storage strategies Up until now, majority of organizations have implemented a distributed management strategy for their direct-attached storage (DAS) infrastructure. This is a simple strategy that comes with high management cost. With this strategy, the storage management cost as well as administrative complexity grows with growth in storage capacity. The centralization of storage management activities offers improved total cost of management. Centralized DAS infrastructure management strategy has the potential to allow administrators to manage twice the storage capacity managed in distributed DAS infrastructure using the same resources. This strategy can result in a reduction of 40% in management cost. The pooling of storage “virtualization” offers further improvement in total cost of management of storage infrastructure. A centralized and virtualized storage environment has potential to allow administrators to manage eight times more storage capacity while reducing the management cost by 70% over a distributed DAS environment. Overall, the cost of managing storage far exceeds the cost of storage hardware. Copyright © 2002 KOM NETWORKS Inc., All Rights Reserved 6 http://www.komnetworks.com A Primer on Nearline and Archival Storage Solutions A Solution: Nearline Storage Organizations can save significant amounts of time and money by migrating less frequently accessed data to nearline storage devices such as automated jukeboxes and libraries for removable storage media. The key point to note here is that there is no performance loss – through data migration, the system becomes more efficient and nearline storage offers relatively fast data accessibility. It’s a win-win situation! Clients Media Library Storage Server Figure 6 – A typical nearline storage solution User Impact With most nearline storage solutions, users and applications are unaware of the physical data location. As well, the mapping of logical storage objects to physical storage objects is transparent to users. The data always appears to be online and the logical storage object appears to have near-infinite storage capacity. The individual or group of removable storage media appears as logical storage objects. Nearline storage solutions allow archived ‘inactive’ data to be accessible to users and applications while devoting online storage to ‘active’ data. The only concern is the potential delay times in accessing the data, which has been migrated to removable media. This concern can be greatly alleviated by using a removable storage system that provides reasonable access time. Sometimes, users and applications need to know where to look for archived and migrated data. Administrators must keep this in mind when setting up their storage environments. Copyright © 2002 KOM NETWORKS Inc., All Rights Reserved 7 http://www.komnetworks.com A Primer on Nearline and Archival Storage Solutions Elimination of Inactive Data from Backups This solution reduces the backup time and in turn management cost by eliminating the repetitive backup of static data. A copy of the static data can be created and removed from a jukebox for off-site storage for disaster recovery purposes. Only new data gets backed up. The elimination of migrated data can reduce backup time by as much as 90%.