White Paper: SSDs IN THE DATA CENTER

Managing High-Value Data With SSDs in the Mix

samsung.com/business introduction: What the Data Lake Really Looks Like We don’t create and operate data centers just to proclaim we can store lots of data. We own the process of acquiring, formatting, merging, safekeeping, and analyzing the flow of information – and enabling decisions based on it. Choosing a solid-state drive (SSD) as part of the data center mix hinges on a new idea: high- value data.

When viewing data solely as a long-term commodity for storage, enterprise hard disk drives (HDDs) form a viable solution. They Faster response brings tiered Transformative: could it, analyzed continue to deliver value where the IT architecture. Data no longer and actioned, change things? important metrics are density and resides solely at Tier 3, the storage Synergistic: is there more than cost per gigabyte. HDDs also offer layer; instead, transactional data one of these dimensions at work? a high mean-time-between-failure spreads across the enterprise. (MTBF) and longevity of storage. High-value data generally is taken Instead of nebulous depths of “big Once the mechanics of platter in from Tier 1 presentation and data,” we now have a manageable rotation and actuator arm speed aggregation layers, and created in region of interest. High-value data are set, an interface chosen and Tier 2 application layers. It may be usually means fast access, but it capacity determined, there isn’t transient, lasting only while a user also means reliable access over any much else left to chance. session is in progress, a sensor is conditions including traffic variety, active, or a particular condition congestion, integrity through For massive volumes of data in arises. temporary loss of power and long- infrequently accessed parts of the term endurance. These parameters “data lake,” HDDs are well suited. SSDs deliver blistering transactional suggest the necessity for a specific Predictability and reliability of performance, shaming HDDs SSD class – the data center SSD – mature HDD technology translates in benchmarks measuring I/O for operations in and around high- to reduced risk. However, large operations per second (IOPS). value data. volumes of long life data are only Performance is not the only part of the equation. consideration for processing high- Choosing the right SSD for an value data. IDC defines what they application calls for several The rise of personal, mobile, social call “target rich” data, which they pieces of insight. First, there is and cloud computing means estimate will grow to about 11% understanding the difference real-time response has become a of the data lake within five years, between client and data center key determining factor. Customer against five criteria:1 models. Next is seeing how satisfaction relies on handling Easy to access: is it connected, or the real-world use case shapes incoming data and requests locked in particular subsystems? up compared to synthetic quickly, accurately, and securely. benchmarks. Finally, recognizing Real-time: is it available for In analytics operating on data the characteristics of decisions in a timely manner? streams, small performance and other technology inside SSDs differences can affect outcomes Footprint: does it affect a lot reveals the attributes needed to significantly. of people, inside or outside the handle high-value data effectively. organization? Different Classes: When Not Just Any SSD Will Do Flash-based SSDs are not all the same, despite many appearing in Consumer-Class Data Center SSDs a 2.5” form factor that looks like an HDD. Beyond common flash - VS - parts and convenient mounting and interconnect, philosophical differences create two classes of Latency increases as workloads increase Lower latency SSDs with important distinctions Built for short bursts of speed Designed for sustained performance in use.2 Lower mixed workload capabilities Mixed workload I/O

Client SSDs are designed View Infographic primarily as replacements for HDDs in personal computers. A Data center SSDs are designed protection (PFP) with tantalum large measure of user experience for speed as well, but also prioritize capacitors holding power long is how fast the machine boots consistency, reliability, endurance, enough to complete pending write an operating system and loads and manageability. Most operations. If encryption is required, applications. SSDs excel, providing applications use multiple drives self-encrypting drives implement very fast access to files in short connected to a server or storage algorithms internally in hardware. bursts of activity. RAM caching, appliance, accessed by numerous using a dedicated region of requests from many sources. Idle Mean time between failure PC memory, or software data time is reduced in 24/7 operation. (MTBF) was critical for HDDs, but is compression can be used to Lifespan becomes a concern as mostly irrelevant for SSDs. Useful improve performance. flash memory cells wear with comparisons for SSDs are two extended write use. metrics: TBW and DWPD. TBW is But typical PC users are not total bytes written, a measure of constantly loading or saving files, An SSD that stalls under load, endurance. DWPD – device writes so the SSD in a PC often sits idle. suffers data errors, or worse yet per day – reflects the number of Client SSDs with low idle power can fails entirely, can put an entire times the entire drive capacity dramatically reduce system power system at risk. Enterprise-class flash can be written daily, for each day consumption. Idle time also allows controllers and firmware avoid any during the warranty period. The a drive to catch up, completing dependence on host software for latest V-NAND flash technology is queued write activity and performance gains; many drives beginning to appear in data center performing background cleanup in use would consume scarce SSDs, offering up to double the known as , recovering flash processing resources. Consistency endurance of planar NAND. blocks where data has been means high-performance deleted. command queue implementations, Some system architects combined with constrained overprovision across multiple Increasing requests on a client background cleanup. client SSDs to offset consistency SSD often result in inconsistent and endurance concerns. performance – which most PC To protect sensitive data, data Overprovisioning counts on greater users tolerate, realizing they kicked center SSDs implement several idle time, reserves more free blocks, off too many things at once. strategies. Advanced firmware and uses more drives – incurring Users can also pay a price during uses low-density parity check more cost, space, and power – than operating system crashes or power (LDPC) error correction with a more necessary, compared to using fewer failures, without protection from efficient algorithm taking less data center SSDs. Understanding losing data or corrupting files. space in flash, resulting in faster use cases and benchmarks can writes. Surviving system power help avoid this expensive practice. interruptions requires power-fail Where Real-World Meets Mixed Loads Key Benchmarks for Application servers often seek from voltages applied. With larger, Mixed Loading

to create homogeneity, allowing more durable flash cell construction 3 predictability to be built in – and streamlined programming Synthetic benchmarks characterizing workload optimization is the term in algorithms, V-NAND-based SSDs SSDs under mixed loading have three vogue. If architects have the luxury offer better write performance and common parameters, with two less of partitioning a data center into greater endurance. obvious considerations: servers each performing dedicated Read/write requests would nominally tasks, it may be possible to focus In reality, few enterprise be 50/50; many test results focus and optimize SSD operations. applications operating on high- on 70/30 as representative of OLTP value data are overwhelmingly environments, while JESD219 calls for Read-intensive use cases are biased one way or the other. Overall 40/60. Simply averaging independent typical of presentation platforms responsiveness is determined read and write results can lead to such as web servers, social media by how an SSD holds up when incorrect conclusions. hosts, search engines and content- subjected to a mix of reads and Random transfer size is often cited at delivery networks. Data is written writes in a random workload. 4KB or 8KB, again typical of OLTP – and once, tagged and categorized, Applications often run on a virtual producing higher IOPS figures. Bigger updated infrequently if ever, and machine, consolidating a variety block sizes can increase throughput, read on-demand by millions of of requesters and tasks on one possibly overstating performance for users. Planar NAND has excellent physical server, further increasing most applications. JESD219 places read performance, and limiting the probability of emphasis on 4KB. 4 the number of writes extends the mixed loads. longevity of an SSD using it. Queue depth (QD) indicates how Client SSDs that look good in many pending requests can be kicked Write-intensive use cases show simplistic synthetic benchmarking off, waiting for service. Increasing QD up in platforms that aggregate with partitioned read or write helps IOPS figures, but may never transactions. Speed is often loads often fall apart in real-world be realized in actual use. Lower QDs critical, such as in real-time scenarios of mixed loading. Data expose potential latency issues; financial instrument trading where center SSDs are designed to higher QDs can smooth out responses milliseconds can mean millions of withstand mixed loading, delivering to requesters in a multi-tenant dollars. Other examples are email not only low real-time latency environment. servers, gathering data from sensor but also a high performance Drive test area should not be networks, ERP systems and data consistency figure. restricted to a small LBA (logical warehousing. Flash writes run block access), amounting to artificial slower than reads because of steps overprovisioning. SSDs should be to program the state of individual preconditioned and accesses directed cells, and subject them to wear across the entire drive to engage garbage collection routines. Entropy, or randomness of patterns, should be set at 100% to nullify data reduction techniques and expose write Data center SSDs are amplification issues. Compression and other algorithms may reduce writes, designed to withstand but performance gains are offset if mixed loading. real-world data is not as uniform as expected. Why QoS Matters Most in High-Value Data Benchmarking SSD read and write QoS implies a baseline where center SSDs with advanced flash performance is straightforward, essentially all pending requests, controllers are designed to reduce reflected on manufacturer data often stated in four- or five-nines to near 1, as sheets. Usually, evaluation with (99.99%, or 99.999%), finish within part of maintaining QoS. mixed-load scenarios is left to a maximum allotted response The number of available SATA independent testing, with some time. Peak performance becomes 3 host ports is massive. V-NAND- results available in third-party a bonus, if favorable conditions based SATA 3 SSDs can saturate reviews. Which metrics best gauge exist for a short period. Rather the interface with sustained data center SSD performance? than portray a high level of IOPS transfers, especially for large block only achievable under near- sizes. This does not automatically Cost per gigabyte was the perfect conditions, QoS reflects imply SATA 3 is a bottleneck; data traditional measure of an HDD. For a consistent, reliable level of center SSD upgrades often target client SSDs, cost per gigabyte is performance. a slower, tapped-out SATA HDD somewhat applicable when directly or inconsistent client SSDs. The replacing an HDD, but omits IOPS Other considerations also highlight ultimate solution may be SATA and other advantages. Cost per the difference between data Express with its co-mingling of gigabyte figures skew in favor of center SSDs and client SSDs: SATA devices and NVMe devices, large HDD capacities, an artifact which are just beginning to appear. from an era of relatively expensive TBW per dollar is an emerging flash. SSDs have made huge metric for data center SSDs. It strides with reduced flash cost and reflects the value of longevity in Data center SSDs are designed increased capacity, a trend that will write-intensive and high-value data to provide the best combination continue with V-NAND. scenarios, especially for V-NAND- of value, performance, and based data center SSDs with their consistency while mitigating IOPS per dollar and IOPS per watt greatly increased write endurance. risk factors common in IT are popular metrics for SSDs. Client SSDs generally sit idle, environments. When high-value They capture the advantage SSDs and rarely incorporate power data must be counted on, day in provide in transactional speed and fail protection for cost reasons. and day out, data center SSDs with power consumption compared to Data center SSDs are likely under better QoS figures are the best HDDs. However, neither accounts significant load for a much higher choice. for important differences between percentage of time; client and data center SSDs. idle power becomes a valley minimum against Quality of Service In high-value data use, the deciding a baseline of average metric for data center SSDs is power consumption. quality of service (QoS). With high Write amplification – a IOPS ratings a given with state- complex phenomenon of-the-art data center SSDs, QoS where logical blocks accounts for latency, consistency, may be written multiple and queue depth. Even short times to physical blocks periods of non-responsiveness are to satisfy requirements Transaction Requests Transaction generally unacceptable in high- such as value data environments. Testing and garbage collection for mixed-load QoS can quickly – can mean while the QoS discriminate a client SSD from a host thinks a transfer is data center SSD. complete, the SSD is still Response Time dealing with writes. Data How High-Value Data Wins Samsung In the bigger picture, data center SSDs deliver against the Data center original five dimensions of high-value data identified: SSD Lineup Easy to access: SATA 3 interfaces are widely available, and almost any computer system can upgrade to SATA 3 data center SSDs without adapter or software changes. Real-time: Data center SSDs provide fast, consistent, reliable storage, enabling applications to meet exacting performance demands. Footprint: High-value data is by definition impactful to users, and 24/7 responsiveness with years of endurance – extended by Samsung 845DC EVO V-NAND – is what data center SSDs do best. USE: For read intensive Transformative: It is not just in storing data, but in detailed applications analysis supporting transparency and rapid decision-making NAND TYPE: Samsung 19nm where data center SSDs come to the front. Toggle 3-bit MLC NAND Synergistic: As the body of high-value data grows, data center PERFORMANCE: Sequential SSDs will keep pace, helping IT teams meet organizational needs Read of Up to 530Mbps; Sequential and achieve new breakthroughs. Write of up to 410Mbps QoS (4KB, QD 32): 99.9% - Read Most comparisons between SSDs are too basic to evaluate real- 0.6ms / Write 7ms world needs in high-value data scenarios. Inserting a client SSD CAPACITIES: 240GB, 480GB and where a data center SSD should be can lead to inconsistent and 960GB disappointing results. Benchmarking under mixed loading helps identify data center SSDs providing better QoS. As data centers Samsung 850DC PRO evolve and grow with high-value data in the mix, success relies USE: For mixed and write- on understanding the distinction between “just any SSD” and intensive applications data center SSDs designed for the role. NAND TYPE: Samsung 24-layer 3D V-NAND 2-bit MLC 1“High Value Data: Finding the Prize Fish in the Data Lake is Key to Success in PERFORMANCE: Sequential the Era of the Third Platform”, IDC, April 2014 Read of Up to 530Mbps; Sequential Write of up to 460Mbps 2 “SSD Performance – A Primer”, SNIA, August 2013 QoS (4KB, QD 32): 99.9% - Read 3“Benchmarking Utilities: What You Should Know”, Samsung white paper 0.6ms / Write 5ms CAPACITIES: 400GB and 800GB 4“JEDEC Standard: Solid-State Drive (SSD) Endurance Workloads”, JESD219, JEDEC, September 2010

Learn more 1-866-SAM4BIZ | samsung.com/business | youtube.com/samsungbizusa | @SamsungBizUSA

© 2015 America, Inc. All rights reserved. Samsung is a registered trademark of Samsung Electronics Co., Ltd. All products, logos and brand names are trademarks or registered trademarks of their respective companies. This white paper is for informational purposes only. Samsung makes no warranties, express or implied, in this white paper. WHP-SSD-DATACENTER-JAN15J