“Big Data- What’s the Big Deal?”
DATA SECURITY & PRIVACY FOR BIG DATA
Cindy E. Compert CIPT/M CTO Data Security & Privacy, IBM Security @CCBigData
1/27/17 “A ship in port is safe; but that is not what ships are built for. Sail out to sea and do new things” – Grace Hopper
2 IBM Security Agenda
• Introduction • Mega Trends • Security & Privacy considerations • Architecture, Technical controls, best practices • Wrap up
3 IBM Security Short History Lesson
“Big Data” Volume, Variety, MapReduce Hadoop, HDFS Velocity
4 IBM Security Big Data Grows Up
5 IBM Security Client Challenges: Megatrends
. Evolving regulatory patchwork
. Breach threats/costs, reputational risk, sanctions
. The ‘Snowden’ effect
. Maintain privacy, encourage innovation
1 Forrester Research: “Understand The State Of Data Security And Privacy: 2014 To 2015
6 IBM Security Digital Convergence: IoT, Analytics, Big Data, Cognitive, Cloud Analytics that Learn
HEALTH EDUCATION EDUCATION
Watson Oncology- Bringing personalized Cognitoys- Toy Dino choose cancer learning to children uses cognitive-enabled treatment therapies around the world learning for customized based on a tumor's interaction genetic fingerprints
Watson Personality Insights: http://www.ibm.com/smarterplanet/us/en/ibmwatson/developercloud/personality-insights.html
8 IBM Security A Data Lake is a Data Scientist’s Dream!
… but data without analytics is just a liability 9 IBM Security Security and Privacy Considerations Security compared to Privacy in a nutshell Confidentiality: Preventing access to non-public information that two parties agree to restrict. May relate to personal or business information. May not be subject to Privacy laws. Regulated Data: Government and/or industry regulation; including PI (Personal Information), healthcare Data Privacy: Controls (e.g. HIPAA/HITECH), Regulatory or and/or financial how 3rd Party Requirements personal or regulated information (e.g. FFEIC) information is collected, used, and shared in accordance with policy Privacy Confidentiality and/or external Security laws/regulations. & PrivacySecurity
Data Security: The PI – Personal technical safeguards used Information: Any to ensure confidentiality, information that identifies integrity, and availability of or can reasonably be used data. to identify, contact, or locate an individual.
11 IBM Security Why is Big Data different?
. More data, exponentially more risk . Immature- less security, governance and discipline, rapidly evolving* . New types of data, new privacy implications . Smart meters, health monitors, connected home, connected car . Linked data– linking public and private data exposes new risks . New uses of data mapping to privacy policy
* http://www.techrepublic.com/article/cios-still-dont-care-about-hadoop-data-security/
12 IBM Security Cool or Creepy?
http://www.zdnet.com/pictures/nine-warning-signs-that-your-technology-needs-an-upgrade/2/ 13 EU GDPR will change the Analytics and Cognitive Landscape
• Definition of “Personal Data” now explicitly includes online identifiers, location data and biometric/genetic data
• Higher standards for privacy notices and for obtaining consent
• Easier access to personal data by a data subject
• Enhanced right to request the erasure of their personal data
• Right to transfer personal data to another organization (portability)
• Right to object to processing now explicitly includes profiling.
14 IBM Security 1/30/201 14 7 Big Data life cycle – from raw to production
Business Users Data Scientist / Traditional (With An Idea), Data Miner, IT / Application Power Users, Advanced Business Developer Data Analysts User, Application Developer
search & exploratory operational survey analysis
. text search . from mountain of . creating/standing up . simple investigations data into a applications, . peek / poke structured world with processes, systems apps to provide with enterprise business value characteristics . iterative in nature, . more formal many false starts environment, SLAs, etc
15 IBM Security Fit-for-purpose security and privacy
Initial / exploratory ...... 192 Used for business decisions use cases
Few security or privacy concerns Protect, Secure, Encrypt
Audit trail tracking Sporadic change management access & changes No data retention requirements Preserve data for N years
Little to no regulation Legislated requirements
No / isolated data quality concerns Data quality imperatives
Sources of information are “interesting” Sources must be trusted
No difference in data governance requirements once the data is used for making operational business decisions
16 IBM Security Privacy is the ‘Why’ and ‘What’… Security is the ‘How’ PI, PII, PHI, NPI.. What is ‘Personal’? It Depends1
CAUTION: Your Legal, Compliance, and Privacy Organization makes a determination of how to enforce privacy regulations, based on risk. IT and InfoSec should not be the arbiters.
18 IBM Security How unique are you?
• Dr. Latanya Sweeney (Harvard, FTC Chief Technologist)- 1997 study identified uniqueness using US Census predicted 87 percent of U.S. population had unique combinations- just using date of birth, gender, and zip code • Try it yourself here: http://aboutmyinfo.org • Additional study on personal genome project identified 84-97% of records, also using demographics plus data mining (http://dataprivacylab.org/projects/pgp/1021-1.pdf)
19 IBM Security Location Location Location
20 IBM Security Questions to ask
1. Where is the sensitive data? 2. Who owns it? 3. How is it classified and managed? 4. How do you know who is accessing it? 5. Where is it flowing? 6. How is it shared? 7. How is it used in test environments? 8. What about 3rd parties and vendor access? 9. What is the quantifiable risk? 10.How do you prioritize discovery and classification?
21 IBM Security 5 steps to a Critical Data Protection Program
The Approach: A comprehensive method for safeguarding your Crown Jewels and protecting your brand
• Define Crown Jewels • Determine Data Security Objectives
• Understand Client Data Security Environment and Infrastructure • Define and Complete Data Discovery Process • Perform Data Analysis and Classify
• Establish Crown Jewels Baselines • Assess and Score Client Data Security Processes and/or Controls • Perform Gap Analysis and Develop Hypotheses
• Determine Risk Remediation Plan • Prioritize and Validate Risk Remediation Solutions • Plan, Design, and Implement
• Determine Crown Jewels Governance Metrics and Process • Enable Monitoring, Communications and Response • Establish Revalidation Criteria and Process
22 IBM Security Where Next? Data Classification
Non-flammable Spontaneously Flammable When combined with water Non-toxic Health Hazard
Toxic Explosive
23 IBM Security Architecture, Technical Controls, Best Practices Security is Security.. Same Disciplines apply… BUT..
Global Threat Intelligence Antivirus Endpoint patching and management Malware protection Incident and threat management Transaction protection Firewalls Device management Sandboxing Content security Virtual patching Network visibility
Fraud protection Log, flow and data analysis Criminal detection Security Application scanning Intelligence Anomaly detection Application security Vulnerability management assessment Incident response
Privileged identity management Data monitoring Cloud Data access control Entitlements and roles Access management Consulting Services | Managed Services Identity management
25 IBM Security Big Data Technical Components
Understand and navigate Federated Discovery and Navigation federated big data sources
Manage & store huge Hadoop File System, Apache Spark volume of any data MapReduce
Structure and control data Data Warehousing, In memory, Cloud databases (Spark, Cloudant)
Manage streaming data Stream Computing
Analyze unstructured data Text Analytics Engine
Integrate and govern all Integration, Data Quality, Security, data sources Lifecycle Management, MDM
26 IBM Security 26 A Hadoop Security Architecture
Static Data Dynamic Data (at rest) (in use) ..and masking
http://www.hadoopsphere.com/2013/01/security-architecture-for-apache-hadoop.html
27 IBM Security Monitoring and auditing challenges
•Many avenues to access
•Security and authentication is evolving
•Complex software stack with significant log data from each component
•Security and audit viewed in isolation from rest of data architecture
28 IBM Security Data Security and Privacy Core Disciplines
Security Controls Core Disciplines: The ‘How’
Understand & Secure & Monitor Define Protect & Audit
Implement Identity & Access Discover sensitive Define policies and Management , Activity metrics assets & who has access Monitoring
Redact/encrypt/mask Classify Assets & Quantify Monitor and enforce; sensitive data in all risk. Review policy exceptions environments
Harden environments to Audit and report Assess Vulnerabilities reduce risk for compliance
29 IBM Security Security Controls for Privacy
On- Hybrid Cloud Premise
Manage Access Protect Data Gain Visibility Enforce Separation of duties , Identify vulnerabilities Monitor data and applications: Safeguard privileged user Prevent attacks targeting Security breaches access, ,Applications, and sensitive data Compliance violations devices • Data Encryption, Masking, Redaction • Identity Governance • Security Information and Event Monitoring • Security Intelligence • Privileged Identity Management • Real-time alerting and blocking • Data and File Activity Monitoring • Mobile Data Management • Cloud access and risk assessment • Application and Mobile App Scanning
Optimize Your Privacy and Data Security Program Deliver a consolidated view of your security operations
• Privacy Program Management • Security & Privacy Risk and Performance Metrics
30 IBM Security Utilitize real-time data activity monitoring for privacy, security & compliance
Data Repositories Continuous, policy-based, real-time (databases, warehouses, file monitoring of all data traffic activities, shares, Big Data) including actions by privileged users Centralize compliance reporting Data protection compliance automation Real-time alerting Monitoring Appliance Key Requirements
. Implement on premise or cloud . 100% visibility including local admin access . Non-invasive/disruptive, cross-platform . Minimal performance impact architecture . Should not rely on resident logs that can easily . Separation of duties enforcement for Database be erased by attackers, rogue insiders Administrator (DBA) access . No environment changes . Detect or block unauthorized & suspicious activity . Integration with broader privacy, security and . Granular, real-time policies compliance tools . Who, what, when, how
31 IBM Security PrivilegedSample ActivityUser Activity Monitoring Report Report
32 IBM Security Data Obfuscation Controls Original Value 4536 6382 9896 5200 Masking Redaction . The ability to desensitize sensitive . The process of obscuring part of a text for information and make it unreadable from security purposes. its original form while preserving its format . The ability to replace real data with and referential integrity substitute characters like (*) . it is a one way algorithm – ie. No unmasking data . SDM – Static Data Masking . DDM – Dynamic Data Masking Masked Value Redacted Value 4212 5454 6565 7780 4536 6382 **** ****
Tokenization Encryption . The process of substituting a “token” which . The process of encoding data in such a can be mapped to the original value way that only authorized individuals can . Token is a non-sensitive equivalent which has no read it by decrypting the encoded data extrinsic value with a key . Must maintain a mapping between the tokens and the original values . Format Preserving Encryption (FPE) is special form of encryption
Token Value Encrypted Value 1@#43$%!xy1K2L4P 33 IBM Security ABCD GDIC JIJG VXYZ Encrypt Data at Rest
Encryption can provide Safe Harbor protection from breach disclosure in many states (consult your compliance team for details) Implement Data protection for your database, HADOOP, and file system environments . Look for high performance encryption, access control and auditing . Data privacy for both online and backup environments . Unified policy and key management for centralized administration across multiple data servers Look for transparency to users, databases, applications, storage . No coding or changes to existing IT infrastructure . Protect data in any storage environment . User access to data same as before Look for centralized administration and Separation of Duties . Policy and Key management . Audit logs . High Availability
34 IBM Security Identity and Access Management helps secure the digital identities for an open enterprise: Big and ‘Little’ Data
Datacenter Web Social Mobile Cloud
Threat-aware Identity and Access Management
Identity Management Access Management
• Identity Governance and Intelligence • Adaptive Access Control and Federation • Identity Lifecycle Management • Application Content Protection • Privileged Identity Control • Authentication and Single Sign On
Directory Services
On Premise Software- Cloud Managed / Appliances as-a- Hosted Services Service
35 IBM Security Putting it all together: Sample Solution Architecture
Real-time alerting and SIEM (Security Information Catalog Policies and value Information and Event 1 Monitoring) integration
Business policies 4 Big Data Activity Monitoring Sensitive data Discovery 2 discovery Monitor & audit Big Data access (HDFS, Hive, HBase, MapReduce, HUE, etc.)
3 Masked MapReduce Masked 3 Data- Files files bases Masking Hadoop masking files
Masked Big Data Masked Masking (HDFS) Files Loader files files Output Big Data files Redacted Redacted Processing Documents Redaction documents documents (MapReduce)
Data sources Hadoop cluster
Components Capability 1 Information Catalogue Define privacy policies and share 2 Sensitive Data Discovery Discover and classify sensitive data Data masking and document 3 Masking and Redaction redaction Monitor and audit Big Data (Hadoop) 4 Hadoop Activity Monitoring (HAM) 36 IBM Security activity Best Practices: Build the foundation
First, know your data 1. Understand the data source, its “trust factor”, the data context and meaning, and how it maps to other enterprise data sources. 2. Determine whether to operationalize (and retain) specific data sources, and which zone to land the data, i.e. Hadoop, Data Warehouse, leave in place, etc. Steps to Assess and Protect: 1. Conduct a Privacy Impact Assessment and a Security Risk Assessment. 2. Inventory and classify sensitive data. 3. Identify and match against legal, contractual, and organizational data protection requirements with assistance from your security, privacy, and compliance organization. 4. Identify protection standards for each classification. For example, all credit card numbers must be encrypted in accordance with PCI DSS. 5. Identify the gaps and set up remediation plans.
37 IBM Security Wrap Up Key messages for sound public policy
- Enable data innovation
- Focus on risks to people
- Protect privacy through principles, not prescription
- Accommodate diversity
- Help organizations manage diverse legal systems
- Encourage organizations to demonstrate accountability
39 IBM Security 39 39 Summary: Keys to Success
1. Manage security and privacy at point of impact or as far upstream as possible. 2. Use multiple complementary approaches to secure critical data- different types of data have different protection requirements 3. Use a holistic approach to safeguarding information no matter where it is. Include the following items: • Understand and document where the data exists along with the exposure risk. • Secure and continuously monitor access to data. • Safeguard both structured and unstructured data • Protect sandboxes and non-production environments • Demonstrate compliance to pass audits
40 IBM Security www.ibm.comTHANK/security YOU
FOLLOW US ON:
ibm.com/security
securityintelligence.com xforce.ibmcloud.com
@ibmsecurity
youtube/user/ibmsecuritysolutions
© Copyright IBM Corporation 2016. All rights reserved. The information contained in these materials is provided for informational purposes only, and is provided AS IS without warranty of any kind, express or implied. Any statement of direction represents IBM's current intent, is subject to change or withdrawal, and represent only goals and objectives. IBM, the IBM logo, and other IBM products and services are trademarks of the International Business Machines Corporation, in the United States, other countries or both. Other company, product, or service names may be trademarks or service marks of others. Statement of Good Security Practices: IT system security involves protecting systems and information through prevention, detection and response to improper access from within and outside your enterprise. Improper access can result in information being altered, destroyed, misappropriated or misused or can result in damage to or misuse of your systems, including for use in attacks on others. No IT system or product should be considered completely secure and no single product, service or security measure can be completely effective in preventing improper use or access. IBM systems, products and services are designed to be part of a lawful, comprehensive security approach, which will necessarily involve additional operational procedures, and may require other systems, products or services to be most effective. IBM does not warrant that any systems, products or services are immune from, or will make your enterprise immune from, the malicious or illegal conduct of any party. Resources
• Follow me on Twitter @CCBigData • IBM Security: http://www-03.ibm.com/security/ • IBM Data Security & Protection: http://www-03.ibm.com/software/products/en/category/SWP23 • Data Security & Privacy Best Practices Blogs: https://securityintelligence.com/author/cindy-compert • Guardium Actvity Monitoring for Hadoop info page: http://ibm.biz/BdsdhR • IBM QRadar Security Intelligence: http://www-03.ibm.com/software/products/en/qradar-siem • IBM Redbook: “Information Governance Principles and Practices for a Big Data Landscape: https://www.redbooks.ibm.com/Redbooks.nsf/RedbookAbstracts/sg248165.html • Top Tips for Securing Big Data Environments: www.ibm.com/services/forms/signup.do?source=sw- infomgt&S_PKG=500031830&S_CMP=Guardium_big_data_ebook
42 IBM Security A recommended approach for Big Data: Activity Monitoring
1. Identify users and classes of users – “privileged” users, data scientists…Who is allowed to access sensitive data . Validate with activity monitoring 2. Identify the applications, jobs, ad-hoc analysis . Validate with activity monitoring 3. When possible identify, encrypt and mask sensitive data before it enters the cluster and identify specific directory location in cluster for that data. Put tighter monitoring controls around that data. 4. Look at exceptions – permission exceptions, other operational errors. Use machine learning to identify patterns of suspicious activity.
43 IBM Security Notices and • Copyright © 2016 by International Business Machines Corporation (IBM). No part of this document may be reproduced or transmitted in any form without written permission from IBM. disclaimers • U.S. Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM.
• Information in these presentations (including information relating to products that have not yet been announced by IBM) has been reviewed for accuracy as of the date of initial publication and could include unintentional technical or typographical errors. IBM shall have no responsibility to update this information. THIS DOCUMENT IS DISTRIBUTED "AS IS" WITHOUT ANY WARRANTY, EITHER EXPRESS OR IMPLIED. IN NO EVENT SHALL IBM BE LIABLE FOR ANY DAMAGE ARISING FROM THE USE OF THIS INFORMATION, INCLUDING BUT NOT LIMITED TO, LOSS OF DATA, BUSINESS INTERRUPTION, LOSS OF PROFIT OR LOSS OF OPPORTUNITY. IBM products and services are warranted according to the terms and conditions of the agreements under which they are provided.
• IBM products are manufactured from new parts or new and used parts. In some cases, a product may not be new and may have been previously installed. Regardless, our warranty terms apply.”
• Any statements regarding IBM's future direction, intent or product plans are subject to change or withdrawal without notice.
• Performance data contained herein was generally obtained in a controlled, isolated environments. Customer examples are presented as illustrations of how those customers have used IBM products and the results they may have achieved. Actual performance, cost, savings or other results in other operating environments may vary.
• References in this document to IBM products, programs, or services does not imply that IBM intends to make such products, programs or services available in all countries in which IBM operates or does business.
• Workshops, sessions and associated materials may have been prepared by independent session speakers, and do not necessarily reflect the views of IBM. All materials and discussions are provided for informational purposes only, and are neither intended to, nor shall constitute legal or other guidance or advice to any individual participant or their specific situation.
• It is the customer’s responsibility to insure its own compliance with legal requirements and to obtain advice of competent legal counsel as to the identification and interpretation of any relevant laws and regulatory requirements that may affect the customer’s business and any actions the customer may need to take to comply with such laws. IBM does not provide legal advice or represent or warrant that its services or products will ensure that the customer is in compliance with any law.
WORLD OF WATSON 2016 44 IBM Security • Information concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other Notices and publicly available sources. IBM has not tested those products in connection with this publication and cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be disclaimers addressed to the suppliers of those products. IBM does not warrant the quality of any third-party products, or the ability of any such third- party products to interoperate with IBM’s products. IBM EXPRESSLY DISCLAIMS ALL WARRANTIES, EXPRESSED OR IMPLIED, continued INCLUDING BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
• The provision of the information contained herein is not intended to, and does not, grant any right or license under any IBM patents, copyrights, trademarks or other intellectual property right.
• IBM, the IBM logo, ibm.com, Aspera®, Bluemix, Blueworks Live, CICS, Clearcase, Cognos®, DOORS®, Emptoris®, Enterprise Document Management System™, FASP®, FileNet®, Global Business Services ®, Global Technology Services ®, IBM ExperienceOne™, IBM SmartCloud®, IBM Social Business®, Information on Demand, ILOG, Maximo®, MQIntegrator®, MQSeries®, Netcool®, OMEGAMON, OpenPower, PureAnalytics™, PureApplication®, pureCluster™, PureCoverage®, PureData®, PureExperience®, PureFlex®, pureQuery®, pureScale®, PureSystems®, QRadar®, Rational®, Rhapsody®, Smarter Commerce®, SoDA, SPSS, Sterling Commerce®, StoredIQ, Tealeaf®, Tivoli®, Trusteer®, Unica®, urban{code}®, Watson, WebSphere®, Worklight®, X-Force® and System z® Z/OS, are trademarks of International Business Machines Corporation, registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at "Copyright and trademark information" at: www.ibm.com/legal/copytrade.shtml.
• Notice: Clients are responsible for ensuring their own compliance with various laws and regulations, including the European Union General Data Protection Regulation. Clients are solely responsibility for obtaining advice of competent legal counsel as to the identification and interpretation of any relevant laws and regulations that may affect the clients’ business and any actions the clients may need to take to comply with such laws and regulations. The products, services, and other capabilities described herein are not suitable for all client situations and may have restricted availability. IBM does not provide legal, accounting or auditing advice or represent or warrant that its services or products will ensure that clients are in compliance with any law or regulation.
WORLD OF WATSON 2016 45 IBM Security