Solution overview Cisco public
Refresh Your Data Lake to Cisco Data Intelligence Platform
The evolving Hadoop landscape Consideration in the journey In the beginning of 2019, providers of leading Hadoop distribution, Hortonworks and Cloudera of a Hadoop refresh merged together. This merger raised the bar on innovation in the big data space and the new Despite the capability gap between “Cloudera” launched Cloudera Data Platform (CDP) which combined the best of Hortonwork’s and Cloudera’s technologies to deliver the industry leading first enterprise data cloud. Recently, Hadoop 2.x and 3.x, it is estimated that Cloudera released the CDP Private Cloud Base, which is the on-premises version of CDP. This more than 80 percent of the Hadoop unified distribution brought in several new features, optimizations, and integrated analytics. installed base is still on either HDP2 or CDH5, which are built on Apache Hadoop CDP Private Cloud Base is built on Hadoop 3.x distribution. Hadoop developed several 2.0, and are getting close to end of capabilities since its inception. However, Hadoop 3.0 is an eagerly awaited major release support by the end of 2020. with several new features and optimizations. Upgrading from Hadoop 2.x to 3.0 is a paradigm shift as it enables diverse computing resources, (i.e., CPU, GPU, and FPGA) to work on data Amid those feature enrichments, and leverage AI/ML methodologies. It supports flexible and elastic containerized workloads, specialized computing resources, and managed either by Hadoop scheduler (i.e., YARN or Kubernetes), distributed deep learning, end of support, a Hadoop upgrade is a GPU-enabled Spark workloads, and more. Not only that, Hadoop 3.0 offers better reliability value-added refresh. Considering these and availability of metadata through multiple standby name nodes, disk balancing for evenly enhancements, it is imperative to find a utilized data nodes, enhanced workloads scheduling with YARN 3.0, and overall improved more holistic approach while refreshing operational efficiency. your data lake, such as conjoining Going forward, the Ozone initiative lays the foundation of the next generation of storage various frameworks and open-source architecture for HDFS, where data blocks are organized in storage containers for higher technologies with the Hadoop ecosystem. scale and handling of small objects in HDFS. The Ozone project also includes an object store implementation to support several new use cases.
© 2020 Cisco and/or its affiliates. All rights reserved. Solution overview Cisco public
As the journey continues in Hadoop, more staggering and impressive software frameworks and technologies are introduced for crunching big data. Going forward, they will continue to evolve more and integrate in a modular fashion. Furthermore, specialized hardware such as GPU and FPGA are becoming the de-facto standard to facilitate deep learning for processing gigantic datasets expeditiously. Figure 1 demonstrates, how AI/ML frameworks and containerization are augmenting the Hadoop ecosystem.
Figure 1. Hadoop 3.0 refresh with AI included