Inference Control and Privacy Preservation in Data Mining
Total Page:16
File Type:pdf, Size:1020Kb
Inference Control and Privacy Preservation in Data Mining
Suggested reviewers: Dan Simovici, Xintao Wu
Motivation: These recent years have seen a staggering increase in the volume of information exchanged through the internet and with it the size of personally identifiable information that is exchanged online and/or is stored in data repositories. This situation brings concerns about individual privacy rights and how to protect them through regulation and technology. This module aims to shed light on the current privacy and data protection issues and some of the methods that help protect it
Target Audience: Senior computer science majors. A graduate level module can also be used with additional assignments.
Prerequisites: Database, data structures, programming.
Module Objectives
Give students an overview of the privacy concepts and requirements.
Present the students with the techniques used for the preservation of private information.
Present current privacy preserving techniques in data mining.
Module Organization
1. The Concept of privacy (2 hours)
a) Definition of Privacy and Data Protection.
b) Privacy and Security
c) Privacy and Legislation:
a.i. Legal: Individual Rights, Human Rights, Fourth Amendment, HiPAA.
a.ii. Organizational Privacy
a.iii. Informational Privacy: Digital Identities and Online Privacy Regulations
d) Types of Privacy Attacks.
e) Evaluating Privacy Techniques: Utility functions, disclosure factor
2. Data Centered Privacy Protection Methods (2 hours) a) What Data to hide:
b) Data Partitioning: Horizontal versus vertical
c) Data Modification : Aggregating, Blocking, Perturbation, Swapping, Sampling
d) Data Hiding: Cryptography Based techniques
3. Privacy Preserving Data Mining Techniques: (2 hours)
a) Data Obfuscation
b) Data Summarization
c) Data Separation
d) Inference Control
i. Confidential Data: Legal Requirements and Societal Expectations
ii. Data Aggregation
iii. Statistical Databases: Inference Control
iv. Conclusion
e) Privacy Preserving Association Rule Mining:
i. Horizontal Data Partitioning
ii. The ID3 Algorithm
Exercise: Privacy preserving association mining
Assume data is horizontally partitioned – Each site has complete information on a set of entities
– Same attributes at each site
The goal is to avoid disclosing entities, please develop an efficient association mining algorithm.