Big Data Storage IGATE
Total Page:16
File Type:pdf, Size:1020Kb
Big Data Storage Apurva Vaidya Anshul Kosarwal IGATE 2014 Storage Developer Conference. © IGATE. All Rights Reserved. Agenda Abstract Big Data and its characteristics Key requirements of Big Data Storage Triage and Analytics Framework Big Data Storage Choices Hyper scale, Scale Out NAS, Object Storage Impact of Flash memory and tiered storage technologies Big Data Storage Architecture Performance Optimization Conclusion 2014 Storage Developer Conference. © IGATE. All Rights Reserved. Abstract Big Data is a key buzzword in IT. And for this, proper storage options should be in place to handle The amount of data to be stored and BIG DATA processed has exploded to a mind boggling degree. Although "Big Data Analytics" has evolved to get a handle on this information content, the data must be appropriately stored for easy and efficient retrieval. 2014 Storage Developer Conference. © IGATE. All Rights Reserved. “The pessimist – complains about the wind The optimist – expects it to change The leader – adjusts the sails” John Maxwell 2014 Storage Developer Conference. © IGATE. All Rights Reserved. Big Data & Its Characteristics 2014 Storage Developer Conference. © IGATE. All Rights Reserved. Storage Capacity LARGE ANALOG 19 Exabytes DIGITAL 280 Exabytes Beginning of the Digital age Global Information Storage Capacity ANALOG 2.6 Exabytes 94% 2017 DIGITAL 7,235 Exabytes 0.02 Exabytes 50% INFORMATION STORAGE CAPACITYINFORMATION 25% % Digital 3% SMALL 1% 1986 1993 2000 2002 2007 Source: Business Computing World (http://www.businesscomputingworld.co.uk/storage-predictions-for-2014/) 2014 Storage Developer Conference. © IGATE. All Rights Reserved. Key Characteristics of Big Data It’s estimated that 2.5 TRILLION GIGABYTES Of data are created each day The Variety Volume Four V’s DIFFERENT SCALE OF DATA FORMS OF Terabytes of Big DATA Records Transactions Structured Tables, files Data Unstructured Semistructured By 2016 there will be 18.9 Billion Sensor Data Network Connections ? Veracity Velocity UNCERTAINTY ANALYSIS OF DATA Batch 1 IN 3 BUSINESS Near time Real time LEADERS Streams Don’t trust the information they use to make decisions Source: http://www.ibmbigdatahub.com/infographic/four-vs-big-data 2014 Storage Developer Conference. © IGATE. All Rights Reserved. Prioritizing Things • Data growth • Enterprise growth • Analytics/business • System • Attracting & Intelligence performance and retaining new • Mobile scalability customers technologies • Network • Reducing • Cloud computing Concerns congestion and enterprise costs connectivity Business Priorities Business Technology Preferences Technology 2014 Storage Developer Conference. © IGATE. All Rights Reserved. Requirements of Big Data Storage 2014 Storage Developer Conference. © IGATE. All Rights Reserved. Key Requirements of Big Data Storage Provide tiered Scalable storage Self-Managing Highly Available Requirements Widely Accessible Security Self Healing Integrate with legacy applications 2014 Storage Developer Conference. © IGATE. All Rights Reserved. Triage and Analytics Framework 2014 Storage Developer Conference. © IGATE. All Rights Reserved. TAF Overview Device Sensor Logs Data Log Analysis Sensor Data Analytics Pattern Identification Real-time Analytics Triage Predictive Query Expression Analysis Columnar Pattern Matching Storage Hadoop MPP Database Actions 2014 Storage Developer Conference. © IGATE. All Rights Reserved. The way we discovered our needs IGATE has its ‘Triage and Hadoop DAS Storage But as data grew, there was performance Analytics framework’ for proactive diagnostic and drop predictive analysis Working Good May be it was time for us to think about the storage that goes below and its scalability So, in order to OPTIMIZE storage we went on to discover storage options which can scale Fast for computation Hadoop Leverages Individual nodes, servers, CPU, memory and RAM RAID support Scale-out nodes on a Hadoop Efficient For cost and performance framework. Global Namespace Accessed and managed as a single system High Availability 2014 Storage Developer Conference. © IGATE. All Rights Reserved. Big Data Storage Choices 2014 Storage Developer Conference. © IGATE. All Rights Reserved. Storage Options 1 2 3 4 DAS, Hyper Traditional NAS, Object Scale Tape, Scale Storage SAN Storage Out NAS etc Storage 2014 Storage Developer Conference. © IGATE. All Rights Reserved. Traditional Storage Options Traditional Storage Options Fails Business Human Machine When It Comes to BIG DATA Data Information Data Needs UnStructured Data Legacy Storage Semi Options Structured Data Structured Data Traditional Storage Disk Storage Tape Storage Optical Jukebox 2014 Storage Developer Conference. © IGATE. All Rights Reserved. Scale out NAS, Object Storage and HSS 2014 Storage Developer Conference. © IGATE. All Rights Reserved. Scale Out NAS Built to Deliver a flexible and scalable platform for Big Data. Simplify management of big data storage infrastructure. Increase efficiency and reduce costs. Why it can be useful? Simple to scale Scale Out NAS is a very simple to scale platform for Big Data. Predictable The performance and reliability is predictable Efficient It leverages all the resources in an efficient way Available High Availability support is within the FS and hence can be guaranteed Enterprise-proven Technology is matured and time tested 18 2014 Storage Developer Conference. © IGATE. All Rights Reserved. Object Storage Built to Optimized object storage systems purpose built for petabytes scale, which also enables organizations to build cost-efficient storage pools for all their unstructured data needs. Provide a highly reliable, infinitely scalable, flexible pricing options, and the ability to store, retrieve, and leverage data the way you want to. It provides? Maximum Portability Lowest cost of ownership Provide set of interfaces including C++ and Java APIs, REST for web It is delivered on commodity based platform for lowest cost of applications and file gateways to support file-based workflows. ownership. Scalable to Petabytes and beyond Index and Search The object storage systems are scalable to petabytes of data, Indexing of the metadata allows extremely fast filtering and billions of files and beyond. search capabilities. Immutable data Cloud Capability (Multi Tenancy) Provides best option for unstructured immutable data Storage can be carved up into virtual containers based on who owns the data. 2014 Storage Developer Conference. © IGATE. All Rights Reserved. Hyper Scale Storage Built for Coined to represent the scale-out design of IT systems with the ability to massively expand under a controlled and efficient managed, software-defined framework. It enables the construction of storage facilities which caters for very high volumes of processing and manages and store petabytes (PB) or even greater quantities of data. Key Criteria It works with a notion of treating data independent from the hardware, Elasticity making it easy to grow or shrink volumes as needed, with no interruption in services. Scalability It supports not just terabytes, but petabytes of data, right from the start. High performance and It eliminates hot spots, I/O bottlenecks and latency, hence offering very availability fast access to information. It supports any and every file type, benefiting from the collective Standards based and open collaboration inherent in any open-source system, resulting in constant source testing, feedback and improvements. It supports or includes a complete storage OS stack, hence has a modular, Leading-edge architecture stackable architecture which enables data to be stored in native formats and and design which also avoid metadata indexing, instead storing and locating data20 algorithmically. 2014 Storage Developer Conference. © IGATE. All Rights Reserved. Analytical Comparison Scale Out NAS Object Storage Hyper Scale Storage Scale Out NAS caters problem with software Object storage can scale Hyperscale architectures typically include dozens management approach., that enables seamlessly for number of to hundreds of generic servers running either a Scalability virtualization layer to makes the nodes behave objects, Exabyte capacity and clustered application or a hypervisor. like a single system. distributed sites. Assuming that hardware will fail, Scale Out This type of storage supports Automatic replication deliver a high level of data infrastructure is built (on commodity Omni-protocol support protection and resiliency. It also eliminates hot hardware) and designed to deal with a higher (reconfigurable bit-stream spots, I/O bottlenecks and latency, hence offering Accessibility / rate of hardware failures. processing in high speed very fast access to information. Availability network). It also supports wide choice of protocols and rich application ecosystem. Scale-out NAS for big data is intelligent Object storage provides Hyperscale gives stripped storage which provides enough to leverage all the resources in highest efficiency with low rapid and efficient expansion to handle big data in Efficiency storage system, regardless of where they are. overhead, no bandwidth kill different use cases, such as Web serving or for By moving data around it optimizes and lowest management cost. some database applications. performance /capacity. Reliability In Scale Out NAS is achieved by Provides all round data Reliability with hyper scale can be achieved by features like hardware RAID, redundant hot- protection with smart installing number of hyper scale manager servers. Reliability