Next Generation Enterprise Lumada Data Lake

Choose Lumada Data Lake to: l Control big data costs by extending your Hadoop lakes with Hitachi Content Platform. l Curate and catalog with data discovery, tracking lineage and metadata enrichment. l Empower self-service dataflow management with no-code graphical data flow tools. DATASHEET Make Data Agile and Visible With Next-Generation Enterprise Data Lake Lumada Data Lake is compatible with the Amazon Web consistency. Its hardware-agnostic Services (AWS) S3 API and supports architecture can be deployed on any bare- Hadoop is becoming complex and cost strong data consistency. metal server, virtual machine or container, prohibitive to scale, and unable to keep up on premises or in the public cloud. When HCP conveniently avoids the cost of with diverse workloads. Indeed, only 15% combined with HCP S series nodes, it ownership and management limitations of Hadoop deployments are successful delivers a highly dense, cost-optimized, associated with Hadoop HDFS by allow- in enterprises. Data lakes have turned on-premises storage with significantly ing storage and compute resources to into swamps due to improper catalog- lower TCO than Hadoop storage nodes. be scaled independently. It offers hybrid ing, curation and governance of data. Erasure coding functionality and auto- cloud storage that places the right data in Furthermore, overreliance on IT for access mated data integrity checking processes the right place, resulting in better storage requests is creating data friction that also ensure long-term data protection and economics beyond public cloud offerings. impedes business innovation. availability. With an innovative, elastic, microser- For data professionals looking to reduce Multicloud support is based on a global vices-based architecture, HCP provides cost and complexities of big data, namespace that allows for unified man- massive scalability to support hundreds Hitachi’s Lumada Data Lake kick-starts agement across multiple on-premises of data nodes, trillions of objects and their DataOps journey. Built on a flexible cloud deployments as well as AWS. exabytes of data. It also features rich, cloud-native architecture, it integrates and Offering broad infrastructure flexibility, policy-driven data management and catalogs data onto a cost-effective and HCP supports any S3 storage endpoint, enrichment capabilities with strong data metadata-rich object store, and offers simple self-service management with low to no coding. Unlike public-cloud-based solutions that become expensive to scale and create undesirable vendor lock-in, Lumada Data Lake is a proven, highly flexible and secure hybrid cloud solution. It offers multipetabyte scalability with low total cost of ownership (TCO) from edge to multicloud architectures. Cost-Effective Object Storage Hitachi Content Platform (HCP) based object store provides a highly scalable storage foundation for Lumada Data Lake. A software-defined object storage solution, HCP based object store addresses modern data lake requirements. It main- tains high performance at hyperscale, Figure 1. Data Catalog including but not limited to Hitachi’s own is being used and, ultimately, the interobject forward with new data insights. Lumada HCP S series nodes, as well as AWS and relationships. Effective curation of this data Data Lake takes some responsibility off of IBM® storage. can then be achieved by adding searchable IT with built-in data zones that promote metadata tag. industry best practices in the separation of Powerful Data Integration raw and clean data for curation and com- When combined, Pentaho and Content With its microservices architecture, Lumada pliance purposes. Furthermore, Lumada Intelligence with Lumada Data Lake offer Data Lake includes a containerized deploy- Data Lake offers data catalog and dataflow unmatched integration of structured, ment of Hitachi Vantara’s Pentaho suite studio capabilities for self-service catalog unstructured and semistructured data and Hitachi Content Intelligence. Pentaho and curation (see Figure 1). This creates sources. provides broad, future-proofed flexibility IT resource efficiencies while promoting a to integrate with many popular big data Multicloud Flexibility highly iterative and collaborative DataOps stores. Thus, data is accessed once, then practice. Unintended public cloud service lock-in can processed, combined and consumed be avoided by remaining cloud agnostic. anywhere. Security and Compliance Hitachi’s multicloud data management lets Hitachi brings decades of experience Connectors are available for Hadoop distri- you put the right data in the right place at across thousands of customers to butions, including Cloudera & MapR, AWS the right time. Leverage public cloud ser- Lumada Data Lake. From a security stand- Elastic MapReduce (EMR), Google Cloud vices for specific use cases, and conserve point, configurable role-based access Platform and Microsoft Azure HDInsight, as on-premises compute resources by copying controls (RBAC) allow for solving data well as popular NoSQL databases, such data to the appropriate cloud-service-based access issues. Sensitive data can also be as MongoDB and Cassandra. With broad application, returning only related insights encrypted at rest, as needed. With respect connectivity to any data type coupled with to the on-premises location. Lastly, using to compliance, user-defined data place- high-performance Spark and MapReduce Hitachi Vantara cloud services, you can cat- ment policies allow for movement of any execution, Pentaho simplifies and accel- egorize and analyze your data footprint, and immutable data into a separate zone, as erates the process of integrating existing reduce time to deploy your data lake from needed. Moreover, detailed auditing and databases with new sources of structured weeks to days, or have it be delivered as a logging occur for all major events, such as and unstructured data. managed service. writing or updating objects. And finally, you Lumada Data Lake also offers a contain- Self-Service Catalog and can set retention and disposal policies for how long the objects need to be stored erized deployment of Hitachi Content Curation Intelligence for indexing, querying and inte- and how they get programmatically deleted grating unstructured data and metadata. Overreliance on IT can delay responses to once the policies expire. Content Intelligence enables better under- data access requests by up to weeks and standing of the stored files, how the content restrict the ability of businesses to move Learn more about how Lumada Data Lake can help you to control big data costs, curate and catalog your data and support self-service dataflow management Hitachi Vantara Corporate Headquarters Contact Information 2535 Augustine Drive USA: 1-800-446-0744 Santa Clara, CA 95054 USA Global: 1-858-547-4526 hitachivantara.com | community.hitachivantara.com hitachivantara.com/contact HITACHI and Lumada are trademarks or registered trademarks of Hitachi, Ltd. Microsoft, Azure and HDinsight are trademarks or registered trademarks of Microsoft Corporation. All other trademarks, service marks, and company names are properties of their respective owners. DS-517-A BTD September 2019.

Next Generation Enterprise Lumada Data Lake

Amazon Connect Data Lake Best Practices AWS Whitepaper Amazon Connect Data Lake Best Practices AWS Whitepaper

Building Big Data Storage Solutions (Data Lakes) for Maximum Flexibility

Cost Modeling Data Lakes for Beginners How to Start Your Journey Into Data Analytics

A Comprehensive Study of Recent Metadata Models for Data Lake

Harness the Power of Your Data

Lake Data Warehouse Architecture for Big Data Solutions

Building a Data Lake for the Enterprise

LOOK BEFORE YOU LEAP INTO the DATA LAKE by Rash Gandhi, Sanjay Verma, Elias Baltassis, and Nic Gordon

Solution Brief Data-Driven Transformation on AWS: a Blueprint

A Big Data Lake for Multilevel Streaming Analytics

Data Lakes Efficiently Consolidate Your Data

Sub-Second Analytics for User-Facing Applications with Apache Spark™ and Rockset Venkat Venkataramani CEO and Co-Founder, Rockset About Me