Lumada DataOps Suite: Ingest, blend, cleanse and prepare diverse data from any source in any environment without code.

DATA SHEET

With Pentaho Data Integration(PDI), a Lumada DataOps Suite product, managing the enormous volumes and increased variety and velocity of data entering organizations is simplified. PDI delivers analytics-ready data to end users faster with visual tools that reduce time and complexity. Without writing SQL or coding in or Python, organizations immediately gain real value from their data, from sources like files, relational databases, Hadoop and more, which are in the cloud or on premises.

Turn Into Actionable ● Rich library of prebuilt components to ● Template-based approach to rapidly Analytics access, prepare and blend data from onboard data sources into Hadoop via relational sources, big data stores on metadata injection feature set. Pentaho’s adaptive big data layer allows premises or in the cloud, enterprise you to plug into popular big data stores ● Ability to seamlessly switch between applications and more. with flexibility and insulation from change. execution engines, such as Spark and Data can be accessed once, then ● Ability to spot check data in flight with Pentaho’s native engine, to fit data processed, combined and consumed immediate access to analytics, including volume and transformation complexity anywhere. Pentaho’s adaptive big data charts, visualizations and reporting, from (see Figure 2). layer includes plug-ins for Hadoop any data prep step. ● Support for advanced analytics models distributions and object stores from ● Powerful orchestration capabilities from , Python, Scala and to Cloudera, Hortonworks, MapR (HPE to coordinate and combine operationalize predictive intelligence Ezmeral Data Fabric), Amazon Web transformations, including notifications while reducing data prep time. Services, Google Cloud and Microsoft and alerts.

Azure, object stores such as Hitachi ● Integrated enterprise scheduler for Content Platform, as well as popular coordinating workflows and debugger NoSQL databases like MongoDB and for testing and tuning job execution. Cassandra. Big Data Processing Integrate and Blend Big Data Performance and Productivity With Existing Enterprise Data Pentaho speeds performance time and With broad connectivity to any data reduces the complexity of integrating big type and high-performance Spark and data sources. Pentaho provides: MapReduce execution, Pentaho simplifies ● Code-free data transformation design and speeds the process of integrating that empowers 15 times faster existing databases with new sources of productivity versus hand-coding data. Pentaho Data Integration’s graphical and executes in-cluster for high designer includes: Figure 1. Drag-and-Drop Data Transformation in performance. Penaho Data Integration ● Intuitive, drag-and-drop designer to simplify the creation of analytics data pipelines (see Figure 1). Broad Connectivity and Data Delivery Pentaho Data Integration offers broad connectivity to a variety of diverse data, including all popular structured, unstructured and semi-structured data sources. Some examples include:

● Relational database management system (RDBMS): Oracle, IBM® DB2®, MySQL, Microsoft SQL Server.

● Spark and Hadoop: Cloudera, Hortonworks, Amazon EMR, MapR (HPE Ezmeral Data Fabric), Microsoft Azure HDInsights.

● NoSQL databases and object stores: MongoDB, Cassandra, HBase, Hitachi Content Platform, AWS S3, Google Cloud Storage, Microsoft Azure ADLS Gen 2.

● Analytic databases: Redshift, Snowflake, Vertica, Greenplum, Teradata, SAP HANA, Figure 2. Adaptive Execution With Spark and Visually Designed Hadoop MapReduce Jobs in PDI , Google Big Query. ● ● Business applications: SAP, Salesforce, as well as data quality operators, such as Shared repository for collaboration Google Analytics. string manipulators, mapping functions, among data analysts, developers and filtering and sorting. For name and address data stewards. ● Files: XML, JSON, Microsoft Excel, verification capabilities, Pentaho integrates CSV, txt, Avro, Parquet, ORC, EBCDIC ● Content management, versioning and with leading data quality vendors, such (mainframe), unstructured files with locking to easily version jobs for roll-back as Human Inference and Melissa Data. metadata, including audio, video and to prior versions. Pentaho data profiling and data quality visual files. ● Control over security privileges for users capabilities help: To increase the performance of data and roles and integration with third-party extraction, loading and delivery processes, ● Identify data that fails to comply with security systems; ability to set Pentaho offers the following capabilities: business rules and standards. permissions for creating, reading or executing jobs and transformations. ● Deduplicate and cleanse inconsistent ● Native connectivity and bulk-loading to and redundant data. most common data sources, including Amazon Redshift and Snowkflake. ● Validate, standardize and correct name, “Moving data across address, email and telephone data. ● Data services to virtualize transformations a business is an art. without staging, making data sets ● Replace file names and locations with Pentaho transforms immediately available to reports and simple business names by integrating applications. with the Lumada Data Catalog, a art into better business ● Automatic creation and publishing of component of the Lumada DataOps suite value.” metadata models to drive faster analytic results. Powerful Administration and – Warren Chang, VP of Engineering, ● Process streaming data in real time. Management Borderfree Pentaho Data Integration provides Data Profiling and Data Quality out-of-the box capabilities for managing Pentaho provides data profiling capabilities, operations for data integration projects. such as row counts, mathematical These capabilities include: functions and identification of null values,

Hitachi Vantara

Corporate Headquarters Contact Information 2535 Augustine Drive USA: 1-800-446-0744 Santa Clara, CA 95054 USA Global: 1-858-547-4526 hitachivantara.com | community.hitachivantara.com hitachivantara.com/contact

HITACHI is a registered trademark of Hitachi, Ltd. Pentaho is a trademark or registered trademark of Hitachi Vantara Corporation. Microsoft, HDInsights, Azure and SQL Server are trademarks or registered trademarks of Microsoft Corporation. IBM and DB2 are trademarks or registered trademarks of International Business Machines Corporation. All other trademarks, service marks, and company names are properties of their respective owners. P-016-G MCoE July 2021