Micro Focus Voltage Securedata for Hadoop Protect Sensitive Data in and Beyond the Data Lake

Solution Flyer Security Micro Focus Voltage SecureData for Hadoop Protect sensitive data in and beyond the data lake SecureData for Hadoop at a Glance: The Need to Secure Sensitive Data and personal data in a low-trust environment ■ High performance and scalability match Hadoop in the Hadoop Ecosystem makes it a prime target for hackers and data cluster sizes and speeds. Apache Hadoop is an open source platform thieves. Clearly, techniques for protecting data ■ Broad platform and application support inside and that provides a software framework for highly at petabyte scales are essential if big data outside of Hadoop across Cloudera, and legacy reliable, scalable, distributed storage and pro- breaches are to be mitigated or eliminated. Hortonworks and MapR distributions. cessing of large data sets. Operational effi- But, equally so, such techniques must not pre- ■ Support for Hadoop ecosystem technologies ciencies gained through the use of clusters of vent or otherwise render infeasible the analytic includes MapReduce, Sqoop, Hive, Spark, Kafka, processing for which the Hadoop data lake Storm, NiFi, TDE. low-cost, high-speed, commodity computers enable organizations to ingest and analyze was created. ■ Protects data close to source, retaining usability for applications and analytics, with selective massive amounts of structured, semi-struc- re-identification by authorized actors. tured, and unstructured data. Traditional Data Protection ■ Encryption, tokenization, hashing, and data masking Is Insufficient protection techniques backed by security proofs In enterprise environments, data security is par- The obvious answer to the Hadoop data secu- and standards. amount. Failure to protect sensitive data incurs rity question is to augment infrastructure con- ■ Secure stateless technologies remove overhead of a major risk of data breach, leaking sensitive trols with protection of the data itself. But while storage and management of keys and token tables data to adversaries, and non-compliance with traditional data protection methods, such as ■ Privacy regulation anonymization and increasingly stringent data privacy laws, such storage level encryption and data masking, can pseudonymization guidance supported as the General Data Protection Regulation be deployed to improve security in the Hadoop (GDPR), the California Consumer Privacy Act environment, these approaches are limited (CCPA), and the Health Insurance Portability and when considered in relation to big data analytics. Accountability Act (HIPAA). Big Data use cases such as real-time analytics, centralized data ac- Storage-level encryption protects data at quisition, and staging for downstream systems rest at the disk volume level. This technology require that enterprises create a “data lake”—a prevents attackers who have simply obtained single location for enterprise data assets. physical access to the disk from being able to read it. While this may be a useful control for Hadoop poses unique challenges in securing Hadoop clusters or large data stores where its data lakes, however, which include account- there are frequent disk repairs and swap-outs, ing for the automated, complex replication of it does not protect the data from anyone who data across multiple storage nodes following has obtained legitimate access credentials. ingestion into the Hadoop Distributed File Such attackers can freely extract all the data System (HDFS). Of course, infrastructure con- on the disk in its unprotected form. trols should be used, including protecting the perimeter of the computing environment, and Data masking is a useful technique for obfus- monitoring the activities of users and networks. cating sensitive data, most often used to cre- But, as has been demonstrated time and time ate functional substitutes of live production again, traditional IT security alone cannot pro- data for test, development, and user training. tect an organization from cyber-attacks or pre- However, masking breaks relationships in the vent data exfiltration in even the most tightly data and thus also the ability to glean insights controlled environments. from such relationships. Masked data is also ir- reversible, destroying its value for analytic and Hadoop is vulnerable. Its architecture is open post-processing scenarios in which access and its aggregation of sensitive corporate to the unprotected data is required. Moreover, Solution Flyer Micro Focus Voltage SecureData for Hadoop masking transforms may fail to fully anonymize Big Data analytics that need to re-identify pseu- management and operational issues when certain data against re-identification, particu- donymized data can still be authorized to do so, new or updated technologies are introduced, larly when correlations against other data in the of course, by authenticating to SecureData’s Voltage SecureData for Hadoop provides Hadoop data lake are possible. high-speed interfaces. And if processed data a framework that enables rapid integration needs to be exported for downstream storage with the latest tools and broad utilization for Voltage SecureData for Hadoop and analytics—such as into an enterprise data secure analytics. Is the Solution warehouse for traditional business intelligence Micro Focus® Voltage SecureData for Hadoop (BI) analysis—there are multiple options for Software Development Kits (SDKs), Application provides a set of advanced security solu- re-identifying the data as it exits the data lake Programming Interfaces (APIs), User-Defined tions, including Voltage Format-Preserving or is imported by downstream platforms, such Functions/Extensions (UDFs/UDxs), integra- Encryption* (FPE), Format-Preserving Hash as Vertica and Teradata, both of which are al- tion code samples, and command line tools (FPH), Secure Stateless Tokenization (SST), ready integrated with Voltage SecureData. enable Voltage security to occur natively on and Stateless Key Management, that enable a wide variety of platforms, and support in- the pseudonymization or anonymization of Rapid Technology Evolution tegration with a broad range of infrastructure sensitive data at field and sub-field levels while Requires Flexible Solutions components, including ETL tools, databases, preserving their format, behavior, and meaning. It’s essential for the long-term security of and programs running throughout the Hadoop Characteristics of the original data, including Big Data investments to apply solutions that environment. Supported technologies include character types, alphabets, and numeric rela- can adapt to the rapid evolutions occurring MapReduce, Sqoop, Hive, Spark, Kafka, Storm, tionships, such as date and salary ranges, can in the Hadoop technology space. In contrast NiFi, and TDE. Supported distributions include be maintained along with their referential integ- to agent-based implementations that create Cloudera CDH, Hortonworks HDP, and MapR. rity across distributed data sets, while avoiding the requirements to manage the storage of encryption keys or token tables that traditional solutions incur. Sensor Data Due to the lack of inherent security controls in Hadoop, a best practice is to never allow sen- THE EDGE sitive data to reach HDFS in its clear, unpro- Edge IT tected form. Voltage protection can be applied Flume SecureData at the source before it is imported into Hadoop, Bl tools evoked from an extract-transform-load (ETL) Sensor work on Data protected process as it is transferred to a Hadoop land- Data Center & Cloud IT data ing zone, or from a Hadoop process as it is Any data Source written to HDFS. Such de-identified forms of Business the data can be used in applications, analytic Hive processes use Public UDFs protected data SQL Sqoopp engines, data transfers, and data stores as data “Landing they are, without further modification, and yet a zone” Hadoop breach that exposes such data yields Laptop SparkS nothing of value to the attackers, avoiding the log files TDE Power user Map re-identifies penalties and costs such an event would oth- Storm Reduce data kafka erwise trigger. Server log files __________ * Figure 1. Threats in the IoT Space—Pushing Protection to the Edge NIST SP-800-38G 2 Contact us at: www.microfocus.com Like what you read? Share it. Securing the Internet of Things licensing for an unlimited number of applica- As the number of Internet connected devices tions running directly on Hadoop or used by A technology company that in the Enterprise continues to multiply, the vol- an ETL or batch process transferring directly provides real-time supply chain ume of data generated and transferred into Big into or out of Hadoop. Protection for additional Data systems like Hadoop is growing expo- Hadoop nodes can be added to these pack- data and analytics for retailers, nentially (see Fig. 1). Data generated from this ages to meet your exact data protection needs. manufacturers, and trading partners, Internet of Things (IoT) is a valued commodity is using Voltage SecureData for for adversaries, as it may contain an organiza- Voltage SecureData for Hadoop Hadoop to de-identify data ingested tion’s Intellectual Property (IP) and sensitive Starter Edition from thousands of hospitals and information, such as personally identifiable in- 1 Key Server and Web Services Server ■ healthcare facilities. The company’s formation (PII), payment card information (PCI), for production or protected health information (PHI). delivery of pharmacy claims Installation kit for Linux platform ■ reconciliation for grocery and Our existing suite of advanced security solu-

Load more