“Big Data- What’s the Big Deal?”

DATA SECURITY & PRIVACY FOR BIG DATA

Cindy E. Compert CIPT/M CTO Data Security & Privacy, IBM Security @CCBigData

1/27/17 “A ship in port is safe; but that is not what ships are built for. Sail out to sea and do new things” – Grace Hopper

2 IBM Security Agenda

• Introduction • Mega Trends • Security & Privacy considerations • Architecture, Technical controls, best practices • Wrap up

3 IBM Security Short History Lesson

“Big Data” Volume, Variety, MapReduce Hadoop, HDFS Velocity

4 IBM Security Big Data Grows Up

5 IBM Security Client Challenges: Megatrends

. Evolving regulatory patchwork

. Breach threats/costs, reputational risk, sanctions

. The ‘Snowden’ effect

. Maintain privacy, encourage innovation

1 Forrester Research: “Understand The State Of Data Security And Privacy: 2014 To 2015

6 IBM Security Digital Convergence: IoT, Analytics, Big Data, Cognitive, Cloud Analytics that Learn

HEALTH EDUCATION EDUCATION

Watson Oncology- Bringing personalized Cognitoys- Toy Dino choose cancer learning to children uses cognitive-enabled treatment therapies around the world learning for customized based on a tumor's interaction genetic fingerprints

Watson Personality Insights: http://www.ibm.com/smarterplanet/us/en/ibmwatson/developercloud/personality-insights.html

8 IBM Security A Data Lake is a Data Scientist’s Dream!

… but data without analytics is just a liability 9 IBM Security Security and Privacy Considerations Security compared to Privacy in a nutshell Confidentiality: Preventing access to non-public information that two parties agree to restrict. May relate to personal or business information. May not be subject to Privacy laws. Regulated Data: Government and/or industry regulation; including PI (Personal Information), healthcare Data Privacy: Controls (e.g. HIPAA/HITECH), Regulatory or and/or financial how 3rd Party Requirements personal or regulated information (e.g. FFEIC) information is collected, used, and shared in accordance with policy Privacy Confidentiality and/or external Security laws/regulations. & PrivacySecurity

Data Security: The PI – Personal technical safeguards used Information: Any to ensure confidentiality, information that identifies integrity, and availability of or can reasonably be used data. to identify, contact, or locate an individual.

11 IBM Security Why is Big Data different?

. More data, exponentially more risk . Immature- less security, governance and discipline, rapidly evolving* . New types of data, new privacy implications . Smart meters, health monitors, connected home, connected car . Linked data– linking public and private data exposes new risks . New uses of data mapping to privacy policy

* http://www.techrepublic.com/article/cios-still-dont-care-about-hadoop-data-security/

12 IBM Security Cool or Creepy?

http://www.zdnet.com/pictures/nine-warning-signs-that-your-technology-needs-an-upgrade/2/ 13 EU GDPR will change the Analytics and Cognitive Landscape

• Definition of “Personal Data” now explicitly includes online identifiers, location data and biometric/genetic data

• Higher standards for privacy notices and for obtaining consent

• Easier access to personal data by a data subject

• Enhanced right to request the erasure of their personal data

• Right to transfer personal data to another organization (portability)

• Right to object to processing now explicitly includes profiling.

14 IBM Security 1/30/201 14 7 Big Data life cycle – from raw to production

Business Users Data Scientist / Traditional (With An Idea), Data Miner, IT / Application Power Users, Advanced Business Developer Data Analysts User, Application Developer

search & exploratory operational survey analysis

. text search . from mountain of . creating/standing up . simple investigations data into a applications, . peek / poke structured world with processes, systems apps to provide with enterprise business value characteristics . iterative in nature, . more formal many false starts environment, SLAs, etc

15 IBM Security Fit-for-purpose security and privacy

Initial / exploratory ...... 192 Used for business decisions use cases

Few security or privacy concerns Protect, Secure, Encrypt

Audit trail tracking Sporadic change management access & changes No data retention requirements Preserve data for N years

Little to no regulation Legislated requirements

No / isolated data quality concerns Data quality imperatives

Sources of information are “interesting” Sources must be trusted

No difference in data governance requirements once the data is used for making operational business decisions

16 IBM Security Privacy is the ‘Why’ and ‘What’… Security is the ‘How’ PI, PII, PHI, NPI.. What is ‘Personal’? It Depends1

CAUTION: Your Legal, Compliance, and Privacy Organization makes a determination of how to enforce privacy regulations, based on risk. IT and InfoSec should not be the arbiters.

18 IBM Security How unique are you?

• Dr. Latanya Sweeney (Harvard, FTC Chief Technologist)- 1997 study identified uniqueness using US Census predicted 87 percent of U.S. population had unique combinations- just using date of birth, gender, and zip code • Try it yourself here: http://aboutmyinfo.org • Additional study on personal genome project identified 84-97% of records, also using demographics plus data mining (http://dataprivacylab.org/projects/pgp/1021-1.pdf)

19 IBM Security Location Location Location

20 IBM Security Questions to ask

1. Where is the sensitive data? 2. Who owns it? 3. How is it classified and managed? 4. How do you know who is accessing it? 5. Where is it flowing? 6. How is it shared? 7. How is it used in test environments? 8. What about 3rd parties and vendor access? 9. What is the quantifiable risk? 10.How do you prioritize discovery and classification?

21 IBM Security 5 steps to a Critical Data Protection Program

The Approach: A comprehensive method for safeguarding your Crown Jewels and protecting your brand

• Define Crown Jewels • Determine Data Security Objectives

• Understand Client Data Security Environment and Infrastructure • Define and Complete Data Discovery Process • Perform Data Analysis and Classify

• Establish Crown Jewels Baselines • Assess and Score Client Data Security Processes and/or Controls • Perform Gap Analysis and Develop Hypotheses

• Determine Risk Remediation Plan • Prioritize and Validate Risk Remediation Solutions • Plan, Design, and Implement

• Determine Crown Jewels Governance Metrics and Process • Enable Monitoring, Communications and Response • Establish Revalidation Criteria and Process

22 IBM Security Where Next? Data Classification

Non-flammable Spontaneously Flammable When combined with water Non-toxic Health Hazard

Toxic Explosive

23 IBM Security Architecture, Technical Controls, Best Practices Security is Security.. Same Disciplines apply… BUT..

Global Threat Intelligence Antivirus Endpoint patching and management Malware protection Incident and threat management Transaction protection Firewalls Device management Sandboxing Content security Virtual patching Network visibility

Fraud protection Log, flow and data analysis Criminal detection Security Application scanning Intelligence Anomaly detection Application security Vulnerability management assessment Incident response

Privileged identity management Data monitoring Cloud Data access control Entitlements and roles Access management Consulting Services | Managed Services Identity management

25 IBM Security Big Data Technical Components

Understand and navigate Federated Discovery and Navigation federated big data sources

Manage & store huge Hadoop File System, Apache Spark volume of any data MapReduce

Structure and control data Data Warehousing, In memory, Cloud databases (Spark, Cloudant)

Manage streaming data Stream Computing

Analyze unstructured data Text Analytics Engine

Integrate and govern all Integration, Data Quality, Security, data sources Lifecycle Management, MDM

26 IBM Security 26 A Hadoop Security Architecture

Static Data Dynamic Data (at rest) (in use) ..and masking

http://www.hadoopsphere.com/2013/01/security-architecture-for-apache-hadoop.html

27 IBM Security Monitoring and auditing challenges

•Many avenues to access

•Security and authentication is evolving

•Complex software stack with significant log data from each component

•Security and audit viewed in isolation from rest of data architecture

28 IBM Security Data Security and Privacy Core Disciplines

Security Controls Core Disciplines: The ‘How’

Understand & Secure & Monitor Define Protect & Audit

Implement Identity & Access Discover sensitive Define policies and Management , Activity metrics assets & who has access Monitoring

Redact/encrypt/mask Classify Assets & Quantify Monitor and enforce; sensitive data in all risk. Review policy exceptions environments

Harden environments to Audit and report Assess Vulnerabilities reduce risk for compliance

29 IBM Security Security Controls for Privacy

On- Hybrid Cloud Premise

Manage Access Protect Data Gain Visibility Enforce Separation of duties , Identify vulnerabilities Monitor data and applications: Safeguard privileged user Prevent attacks targeting Security breaches access, ,Applications, and sensitive data Compliance violations devices • Data Encryption, Masking, Redaction • Identity Governance • Security Information and Event Monitoring • Security Intelligence • Privileged Identity Management • Real-time alerting and blocking • Data and File Activity Monitoring • Mobile Data Management • Cloud access and risk assessment • Application and Mobile App Scanning

Optimize Your Privacy and Data Security Program Deliver a consolidated view of your security operations

• Privacy Program Management • Security & Privacy Risk and Performance Metrics

30 IBM Security Utilitize real-time data activity monitoring for privacy, security & compliance

Data Repositories  Continuous, policy-based, real-time (databases, warehouses, file monitoring of all data traffic activities, shares, Big Data) including actions by privileged users Centralize compliance reporting  Data protection compliance automation Real-time alerting Monitoring Appliance Key Requirements

. Implement on premise or cloud . 100% visibility including local admin access . Non-invasive/disruptive, cross-platform . Minimal performance impact architecture . Should not rely on resident logs that can easily . Separation of duties enforcement for Database be erased by attackers, rogue insiders Administrator (DBA) access . No environment changes . Detect or block unauthorized & suspicious activity . Integration with broader privacy, security and . Granular, real-time policies compliance tools . Who, what, when, how

31 IBM Security PrivilegedSample ActivityUser Activity Monitoring Report Report

32 IBM Security Data Obfuscation Controls Original Value 4536 6382 9896 5200 Masking Redaction . The ability to desensitize sensitive . The process of obscuring part of a text for information and make it unreadable from security purposes. its original form while preserving its format . The ability to replace real data with and referential integrity substitute characters like (*) . it is a one way algorithm – ie. No unmasking data . SDM – Static Data Masking . DDM – Dynamic Data Masking Masked Value Redacted Value 4212 5454 6565 7780 4536 6382 **** ****

Tokenization Encryption . The process of substituting a “token” which . The process of encoding data in such a can be mapped to the original value way that only authorized individuals can . Token is a non-sensitive equivalent which has no read it by decrypting the encoded data extrinsic value with a key . Must maintain a mapping between the tokens and the original values . Format Preserving Encryption (FPE) is special form of encryption

Token Value Encrypted Value 1@#43$%!xy1K2L4P 33 IBM Security ABCD GDIC JIJG VXYZ Encrypt Data at Rest

Encryption can provide Safe Harbor protection from breach disclosure in many states (consult your compliance team for details) Implement Data protection for your database, HADOOP, and file system environments . Look for high performance encryption, access control and auditing . Data privacy for both online and backup environments . Unified policy and key management for centralized administration across multiple data servers Look for transparency to users, databases, applications, storage . No coding or changes to existing IT infrastructure . Protect data in any storage environment . User access to data same as before Look for centralized administration and Separation of Duties . Policy and Key management . Audit logs . High Availability

34 IBM Security Identity and Access Management helps secure the digital identities for an open enterprise: Big and ‘Little’ Data

Datacenter Web Social Mobile Cloud

Threat-aware Identity and Access Management

Identity Management Access Management

• Identity Governance and Intelligence • Adaptive Access Control and Federation • Identity Lifecycle Management • Application Content Protection • Privileged Identity Control • Authentication and Single Sign On

Directory Services

On Premise Software- Cloud Managed / Appliances as-a- Hosted Services Service

35 IBM Security Putting it all together: Sample Solution Architecture

Real-time alerting and SIEM (Security Information Catalog Policies and value Information and Event 1 Monitoring) integration

Business policies 4 Big Data Activity Monitoring Sensitive data Discovery 2 discovery Monitor & audit Big Data access (HDFS, Hive, HBase, MapReduce, HUE, etc.)

3 Masked MapReduce Masked 3 Data- Files files bases Masking Hadoop masking files

Masked Big Data Masked Masking (HDFS) Files Loader files files Output Big Data files Redacted Redacted Processing Documents Redaction documents documents (MapReduce)

Data sources Hadoop cluster

Components Capability 1 Information Catalogue Define privacy policies and 2 Sensitive Data Discovery Discover and classify sensitive data Data masking and document 3 Masking and Redaction redaction Monitor and audit Big Data (Hadoop) 4 Hadoop Activity Monitoring (HAM) 36 IBM Security activity Best Practices: Build the foundation

First, know your data 1. Understand the data source, its “trust factor”, the data context and meaning, and how it maps to other enterprise data sources. 2. Determine whether to operationalize (and retain) specific data sources, and which zone to land the data, i.e. Hadoop, Data Warehouse, leave in place, etc. Steps to Assess and Protect: 1. Conduct a Privacy Impact Assessment and a Security Risk Assessment. 2. Inventory and classify sensitive data. 3. Identify and match against legal, contractual, and organizational data protection requirements with assistance from your security, privacy, and compliance organization. 4. Identify protection standards for each classification. For example, all credit card numbers must be encrypted in accordance with PCI DSS. 5. Identify the gaps and set up remediation plans.

37 IBM Security Wrap Up Key messages for sound public policy

- Enable data innovation

- Focus on risks to people

- Protect privacy through principles, not prescription

- Accommodate diversity

- Help organizations manage diverse legal systems

- Encourage organizations to demonstrate accountability

39 IBM Security 39 39 Summary: Keys to Success

1. Manage security and privacy at point of impact or as far upstream as possible. 2. Use multiple complementary approaches to secure critical data- different types of data have different protection requirements 3. Use a holistic approach to safeguarding information no matter where it is. Include the following items: • Understand and document where the data exists along with the exposure risk. • Secure and continuously monitor access to data. • Safeguard both structured and unstructured data • Protect sandboxes and non-production environments • Demonstrate compliance to pass audits

40 IBM Security www..comTHANK/security YOU

FOLLOW US ON:

ibm.com/security

securityintelligence.com xforce.ibmcloud.com

@ibmsecurity

youtube/user/ibmsecuritysolutions

© Copyright IBM Corporation 2016. All rights reserved. The information contained in these materials is provided for informational purposes only, and is provided AS IS without warranty of any kind, express or implied. Any statement of direction represents IBM's current intent, is subject to change or withdrawal, and represent only goals and objectives. IBM, the IBM logo, and other IBM products and services are trademarks of the International Business Machines Corporation, in the United States, other countries or both. Other company, product, or service names may be trademarks or service marks of others. Statement of Good Security Practices: IT system security involves protecting systems and information through prevention, detection and response to improper access from within and outside your enterprise. Improper access can result in information being altered, destroyed, misappropriated or misused or can result in damage to or misuse of your systems, including for use in attacks on others. No IT system or product should be considered completely secure and no single product, service or security measure can be completely effective in preventing improper use or access. IBM systems, products and services are designed to be part of a lawful, comprehensive security approach, which will necessarily involve additional operational procedures, and may require other systems, products or services to be most effective. IBM does not warrant that any systems, products or services are immune from, or will make your enterprise immune from, the malicious or illegal conduct of any party. Resources

• Follow me on Twitter @CCBigData • IBM Security: http://www-03.ibm.com/security/ • IBM Data Security & Protection: http://www-03.ibm.com/software/products/en/category/SWP23 • Data Security & Privacy Best Practices Blogs: https://securityintelligence.com/author/cindy-compert • Guardium Actvity Monitoring for Hadoop info page: http://ibm.biz/BdsdhR • IBM QRadar Security Intelligence: http://www-03.ibm.com/software/products/en/qradar-siem • IBM Redbook: “Information Governance Principles and Practices for a Big Data Landscape: https://www.redbooks.ibm.com/Redbooks.nsf/RedbookAbstracts/sg248165.html • Top Tips for Securing Big Data Environments: www.ibm.com/services/forms/signup.do?source=sw- infomgt&S_PKG=500031830&S_CMP=Guardium_big_data_ebook

42 IBM Security A recommended approach for Big Data: Activity Monitoring

1. Identify users and classes of users – “privileged” users, data scientists…Who is allowed to access sensitive data . Validate with activity monitoring 2. Identify the applications, jobs, ad-hoc analysis . Validate with activity monitoring 3. When possible identify, encrypt and mask sensitive data before it enters the cluster and identify specific directory location in cluster for that data. Put tighter monitoring controls around that data. 4. Look at exceptions – permission exceptions, other operational errors. Use machine learning to identify patterns of suspicious activity.

43 IBM Security Notices and • Copyright © 2016 by International Business Machines Corporation (IBM). No part of this document may be reproduced or transmitted in any form without written permission from IBM. disclaimers • U.S. Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM.

• Information in these presentations (including information relating to products that have not yet been announced by IBM) has been reviewed for accuracy as of the date of initial publication and could include unintentional technical or typographical errors. IBM shall have no responsibility to update this information. THIS DOCUMENT IS DISTRIBUTED "AS IS" WITHOUT ANY WARRANTY, EITHER EXPRESS OR IMPLIED. IN NO EVENT SHALL IBM BE LIABLE FOR ANY DAMAGE ARISING FROM THE USE OF THIS INFORMATION, INCLUDING BUT NOT LIMITED TO, LOSS OF DATA, BUSINESS INTERRUPTION, LOSS OF PROFIT OR LOSS OF OPPORTUNITY. IBM products and services are warranted according to the terms and conditions of the agreements under which they are provided.

• IBM products are manufactured from new parts or new and used parts. In some cases, a product may not be new and may have been previously installed. Regardless, our warranty terms apply.”

• Any statements regarding IBM's future direction, intent or product plans are subject to change or withdrawal without notice.

• Performance data contained herein was generally obtained in a controlled, isolated environments. Customer examples are presented as illustrations of how those customers have used IBM products and the results they may have achieved. Actual performance, cost, savings or other results in other operating environments may vary.

• References in this document to IBM products, programs, or services does not imply that IBM intends to make such products, programs or services available in all countries in which IBM operates or does business.

• Workshops, sessions and associated materials may have been prepared by independent session speakers, and do not necessarily reflect the views of IBM. All materials and discussions are provided for informational purposes only, and are neither intended to, nor shall constitute legal or other guidance or advice to any individual participant or their specific situation.

• It is the customer’s responsibility to insure its own compliance with legal requirements and to obtain advice of competent legal counsel as to the identification and interpretation of any relevant laws and regulatory requirements that may affect the customer’s business and any actions the customer may need to take to comply with such laws. IBM does not provide legal advice or represent or warrant that its services or products will ensure that the customer is in compliance with any law.

WORLD OF WATSON 2016 44 IBM Security • Information concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other Notices and publicly available sources. IBM has not tested those products in connection with this publication and cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be disclaimers addressed to the suppliers of those products. IBM does not warrant the quality of any third-party products, or the ability of any such third- party products to interoperate with IBM’s products. IBM EXPRESSLY DISCLAIMS ALL WARRANTIES, EXPRESSED OR IMPLIED, continued INCLUDING BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.

• The provision of the information contained herein is not intended to, and does not, grant any right or license under any IBM patents, copyrights, trademarks or other intellectual property right.

• IBM, the IBM logo, ibm.com, Aspera®, Bluemix, Blueworks Live, CICS, Clearcase, Cognos®, DOORS®, Emptoris®, Enterprise Document Management System™, FASP®, FileNet®, Global Business Services ®, Global Technology Services ®, IBM ExperienceOne™, IBM SmartCloud®, IBM Social Business®, Information on Demand, ILOG, Maximo®, MQIntegrator®, MQSeries®, Netcool®, OMEGAMON, OpenPower, PureAnalytics™, PureApplication®, pureCluster™, PureCoverage®, PureData®, PureExperience®, PureFlex®, pureQuery®, pureScale®, PureSystems®, QRadar®, Rational®, Rhapsody®, Smarter Commerce®, SoDA, SPSS, Sterling Commerce®, StoredIQ, Tealeaf®, Tivoli®, Trusteer®, Unica®, urban{code}®, Watson, WebSphere®, Worklight®, X-Force® and System z® Z/OS, are trademarks of International Business Machines Corporation, registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at "Copyright and trademark information" at: www.ibm.com/legal/copytrade.shtml.

• Notice: Clients are responsible for ensuring their own compliance with various laws and regulations, including the European Union General Data Protection Regulation. Clients are solely responsibility for obtaining advice of competent legal counsel as to the identification and interpretation of any relevant laws and regulations that may affect the clients’ business and any actions the clients may need to take to comply with such laws and regulations. The products, services, and other capabilities described herein are not suitable for all client situations and may have restricted availability. IBM does not provide legal, accounting or auditing advice or represent or warrant that its services or products will ensure that clients are in compliance with any law or regulation.

WORLD OF WATSON 2016 45 IBM Security