Fill the Data Lake

Blueprint for Success: A Best Practice Solution Pattern

DATASHEET

Simplify and Accelerate Hadoop Data Ingestion With a Scalable Approach

What Is It? Why Do It? As organizations scale up data onboarding from just a few ■■ Reduce IT time and cost spent building and maintaining sources going into Hadoop to hundreds or more, IT time and repetitive big data ingestion jobs, allowing valuable staff to resources can be monopolized, creating hundreds of hard-coded dedicate time to more strategic projects. data movement procedures—and the process is often highly ■■ Minimize risk of manual errors by decreasing dependence on manual and error-prone. Hitachi Vantara’s Filling the Data Lake hard-coded data ingestion procedures. blueprint provides a template-based approach to solving these ■■ Automate business processes for efficiency and speed, while challenges, and is comprised of: maintaining data governance.

■■ A flexible, scalable, and repeatable process to onboard a ■■ Enable more sophisticated analysis by business users with new growing number of data sources into Hadoop data lakes. and emerging data sources. ■■ Streamlined data ingestion from hundreds or thousands of disparate comma-separated value (CSV) files or database Value of the Platform From Hitachi tables into Hadoop. Vantara

■■ An automated, template-based approach to data workflow ■■ Unique injection capability accelerates time-to-value creation. by automating many onboarding jobs with just a few templates.

■■ Simplified regular data movement at scale into Hadoop in the ■■ Intuitive graphical user interface for big means Avro format. existing extract, transform, load (ETL) developers can create repeatable data movement flows without coding: in minutes, not hours.

■■ Ability to architect a governed process that is highly reusable.

■■ Robust integration with the broader Hadoop ecosystem and semi structured data. What Implementing the Fill the Data Lake Blueprint May Look Like in a Financial Organization This company uses metadata injection to move thousands of data sources into Hadoop in a streamlined, dynamic integration process.

■■ Large financial services organization with thousands of input sources.

■■ Reduce number of ingest processes through metadata Injection.

■■ Deliver transformed data directly into Hadoop in the Avro format.

DBM

HDOOP VO FOM

I POD

V

DIP D O DMI D IIO PO DMI FOMIO

DS relational database management system S commaseparated ale

About Hitachi Vantara Hitachi Vantara, a wholly owned subsidiary of Hitachi, Ltd., helps data-driven leaders find and use the value in their data to innovate intelligently and reach outcomes that matter for business and society. We combine technology, intellectual property and industry knowledge to deliver data-managing solutions that help enterprises improve their customers’ experiences, develop new revenue streams, and lower business costs. Only Hitachi Vantara elevates your innovation advantage by combining deep information technology (IT), operational technology (OT) and domain expertise. We work with organizations everywhere to drive data to meaningful outcomes. Visit us at HitachiVantara.com.

Hitachi Vantara

Corporate Headquarters Contact Information 2535 Augustine Drive USA: 1-800-446-0744 Santa Clara, CA 95054 USA Global: 1-858-547-4526 HitachiVantara.com | community.HitachiVantara.com HitachiVantara.com/contact

HITACHI is a trademark or registered trademark of Hitachi, Ltd. Pentaho is a trademark or registered trademark of Hitachi Vantara Corporation. All other trademarks, service marks, and company names are properties of their respective owners. P-023-B BTD April 2019