MEETING BIG DATA CHALLENGES with EMC ISILON STORAGE SYSTEMS Anuj Sharma
Total Page:16
File Type:pdf, Size:1020Kb
MEETING BIG DATA CHALLENGES WITH EMC ISILON STORAGE SYSTEMS Anuj Sharma Contents Abstract...................................................................................................................................... 3 Big Data Challenges and EMC Isilon Storage Systems .............................................................. 6 Big Data Value-Add To Business ............................................................................................... 9 OneFS Architecture ..................................................................................................................11 Which EMC Isilon Cluster to choose? .......................................................................................16 EMC Isilon Cluster Networking Best Practices ..........................................................................17 EMC Isilon Smart Connect Internals .........................................................................................20 SmartConnect Architecture Example ........................................................................................27 EMC Isilon Smart Quotas Internals ...........................................................................................28 EMC Isilon and vSphere Integration Best Practices ..................................................................31 EMC Isilon SyncIQ Architecture and Tips ..................................................................................36 EMC Isilon NDMP Backup Configuration for EMC NetWorker ...................................................37 Cluster Performance Tuning .....................................................................................................41 EMC Isilon Cluster Maintenance ...............................................................................................43 References ...............................................................................................................................47 Glossary....................................................................................................................................48 Figures Figure 2: Big Data Sources ........................................................................................................ 5 Figure 3: Isilon Node Types ....................................................................................................... 8 Figure 5: Big Data Enabled Property and Casualty Insurance Policy Premium Factors .............. 9 Figure 6: OneFS vs. Traditional File Systems ...........................................................................11 Figure 7: OneFS vs. Traditional File Systems ...........................................................................12 Figure 8: Isilon Cluster ..............................................................................................................13 Figure 9: OneFS Protection.......................................................................................................15 Figure 10: 10GigE Networking with Accelerator Node ...............................................................19 Figure 11: Redundant Internal Network Topology .....................................................................20 Figure 12: SmartConnect Communication .................................................................................22 Figure 14: SmartConnect Configuration ....................................................................................27 Figure 15: Optimizing Isilon NFS for VM I/O Operations ...........................................................33 Figure 16: Isilon NFS Architecture .............................................................................................34 Figure 17: Isilon iSCSI Architecture ...........................................................................................35 Figure 19: Direct NMDP Method ...............................................................................................37 Figure 20: Remote NDMP Model ..............................................................................................38 Disclaimer: The views, processes, or methodologies published in this article are those of the author. They do not necessarily reflect EMC Corporation’s views, processes, or methodologies. 2012 EMC Proven Professional Knowledge Sharing 2 Abstract What is Big Data? Big Data does not refer to a specific type of data; every kind of unstructured data can be considered big data when a single file size is in terabytes. Digital data is growing at an exponential rate today, and “big data” is the new buzzword in IT circles. According to International Data Corporation (IDC), the amount of digital data created and replicated will surpass 1.8 Zettabytes (1.8 trillion GB) in 2011, having grown by a factor of nine in just five years. The information we deal with today is very different from the information that we used to deal with 20-30 years ago. Chip manufacturers render terabyte files, oil and gas exploration companies deal with terabytes of data to be analyzed, advancements in healthcare has led to the creation of high-definition 4-D imaging files ranging up to terabytes, and NASA deals with a large number of files with sizes in terabytes. Social networking sites and online communities generate data in huge numbers. It seems no industry is safe from massive data growth and the storage implications are profound. Figure 1: Data Growth The massive amount of rich unstructured file data generated by richer file formats and Internet Era computing is creating a demand for new and innovative scale-out file storage solutions to economically scale bandwidth and performance to previously unheard of capacities. 2012 EMC Proven Professional Knowledge Sharing 3 I have seen many organizations relying on scale-up storage for Big Data storage and analytics. However, in the long run, companies are bound to encounter performance and storage problems using scale-up storage systems for Big Data storage. Scale-up storage are monolithic storage systems where lots of storage sits behind one or two file server heads and is designed to scale to multi-TB range behind those file server heads. Once the storage and performance limitations are reached, a new monolithic storage system must be added. As the existing file systems residing on the previous scale-up storage system cannot be expanded to leverage the storage of the new scale-up storage system, a new file system needs to be managed, even if there is only the need to add minimal incremental storage capacity. This is one of the problems that enterprises run into while dealing with Big Data using scale-up storage systems. A single file system is limited to TB’s in the case of scale-up storage systems and file system migration is often a painful exercise and requires downtime. Also, traditional analysis tools cannot be used to analyze Big Data. Data needs to be mined real-time and results need to be published. For example, a retail store can see instantaneously which stores are most profitable, which item is in demand, the consumer choices per region, and so on. Big Data analysis is a critical factor that plays a significant role in future business decisions of the organization. Parallel processing is required to mine data of such huge volumes simultaneously. Consequently, scale-out storage systems are the best candidates for high parallel processing power. Scale-out storage architectures are significantly different than monolithic scale-up storage architectures (e.g. traditional NAS or SAN systems) that were developed to meet distributed computing needs. “Scale-out NAS” are systems designed from the ground up for economically dynamic scale and for supporting extremely high bandwidth applications dealing with multi- terabyte files often referred to as Big Data. The EMC Isilon® storage system is the world leader in the scale-out NAS category. 2012 EMC Proven Professional Knowledge Sharing 4 Figure 2: Big Data Sources In this article we will discuss why scale-up storage is not able to meet the performance, cost, and storage requirements of Big Data efficiently and how scale-out storage successfully meets all the requirements of Big Data. We also discuss an example of how Big Data can add value to business. Additionally, the following areas will be covered: • Big Data Challenges • EMC Isilon Storage Systems and OneFS® Architecture • EMC Isilon Storage Systems Features such as SyncIQ, SmartConnect, SmartQuotas • Best practices for the Implementation of EMC Isilon Storage Clustered Systems • Best Practices for VSphere 4 integration with EMC Isilon Storage Systems • Cluster Maintenance • EMC Isilon NDMP Backup Configuration And much more … 2012 EMC Proven Professional Knowledge Sharing 5 Big Data Challenges and EMC Isilon Storage Systems • Unstructured Data is being generated at exponential rates The pace of data growth requires storage that can scale on demand. Typically, storage is purchased with a view on future or peak requirements. Most often, we end up spending more as the cost of the equipment declines over time. EMC Isilon provides the benefit of adding the storage on demand instead of buying at once; start with minimal nodes and then scale out over time. • Seismic applications, NASA satellite imaging, and high performance video rendering applications require storage that support huge IOPS and data transfer throughput EMC Isilon has a scale-out architecture; whenever a node is added, storage and compute is also added. Hence, compute increases linearly