Solution brief
Decoupling Storage from Compute in Apache Hadoop* with Ceph*
An eye opening possibility for big-data storage emerges with a new proof-of-concept built on Intel® technologies, Quanta Cloud Technology (QCT) hardware, and Ceph.
Most organizations need to keep expanding their storage capacities in response to persistent data growth. But for many large companies, it can be challenging to efficiently acquire resources to meet the growing storage needs. If the data footprint of a large enterprise already approaches petabyte levels, even a low rate of annual data storage growth can amount to an increase of hundreds of terabytes every year. At such a scale, any inefficiencies in capital expenditures (CapEx) and resource allocation are greatly magnified.
Inefficiencies of Scaling Hadoop When it comes to acquiring storage capacity, IT decision makers commonly choose to archive data in Hadoop Distributed File System* (HDFS*) so that they can perform analytics and gain business-relevant insights from that data. The problem? Storage and compute resources in Apache Hadoop* are bound together, so when organizations acquire more Hadoop storage, they end up purchasing compute resources that they might not need. Over time, these purchasing habits lead to more and more compute capacity going unused, which is a waste of processing cycles and IT spending. Acquiring Hadoop storage is also inefficient in another important respect: Hadoop storage can be used only for Hadoop workloads. If a company needs storage for other types of workloads, the company needs to purchase additional capacity dedicated to storage technologies other than HDFS.
Advantages to Disaggregating Compute and Storage in Hadoop Hadoop and HDFS were originally designed with direct-attached storage (DAS) in mind. But if it were possible to separate Hadoop compute and storage, enterprises could be more agile and flexible in responding to customer needs, and they could reduce operational expenditures (OpEx) and CapEx. For example, compute servers could be virtualized to provide faster deployments, enhanced security, and multitenancy. Disaggregating storage and compute could also allow companies to scale these resources independently and purchase only what they need of each. Solution Brief | Decoupling Storage from Compute in Apache Hadoop* with Ceph*
Apache Hadoop* lusters nput/output /O Hadoop Hadoop Hadoop Hadoop Ob ect storage device OSD eph ournal ntel AS
Intel Intel HDD 1 HDD 24 eph eph eph eph NVMe* 1 NVMe 2