DeDe-IdentificationDe--IdentificationIdentification ofof ClinicalClinical DataData

TEPR Conference 2008 Sepideh Khosravifar, CISSP Ft. Lauderdale, Florida Info Security Analyst IV May 17 - 21, 2008

1

1 Slide 1 cmw1 Craig M. Winter, 4/25/2008 BackgroundBackground

One of the major challenges facing Medical Informatics is creating data sets for research and testing that maintain patient . De-identification is a required element of information integration, reducing the risks of unauthorized disclosure.

2 AnonymizationAnonymization

Anonymization is the process that removes the association between a data set and the data subject. It can be done in the following ways: (1) Removing or transforming identifying characteristics in the data set so that the association is not unique and relates to more than one data subject (2) Increasing the population in the data subjects set so that the association between the data set and the data subject is not unique.

Source: ISO/IEC DTS 25237 3 PseudonymizationPseudonymization

Pseudonymization is a particular type of anonymization that both removes the association with a data subject and adds an association between a particular set of characteristics relating to the data subject and one or more pseudonyms. It provides a means for information to be linked to the same person across multiple data records without revealing the identity of the person as a data subject.

Source: ISO/IEC DTS 25237

4 Re-identificationRe-identification

Pseudonymization through the trusted third party can support re-identification where the implementation requires re-identification such as supporting case investigation and other public health event detection and management. Reasons for re-identification that should be considered include:

– Verification and validation of data integrity – Checking for suspected duplicate records – Enabling requests for additional data – Linking to supplement research information variables – Compliance audits – Informing data subjects or their care providers of significant findings – Facilitating follow-up research – Law enforcement.

5 IssuesIssues RequiringRequiring ConsiderationConsideration

Frequency and types of errors in de-identification method. De-id tools are subject to at least two types of errors: (a) Failure to remove information that constitutes one of the 18 HIPAA Safe Harbor data elements (Undermarking), (b) Removal of more information than is required (Overmarking) rendering records less useful and informative.

6 NHINNHIN AnonymizationAnonymization Guidelines Guidelines

HIPAA de-identification Anonymization [45CFR164.514(b)(2)(i)] Guidelines

(A) Names; 3) Replace patient, contact, next of kin, provider, technician and any other person name data with fabricated data. 10) Replace all employer, practice, laboratory, etc. names with fabricated names.

(B) All geographic subdivisions 6) Replace all geographic location smaller than a State, including street data (patient, provider, etc.) smaller address, city, county, precinct, zip than a state with fabricated data, code, and their equivalent geocodes, including street address, city, county, except for the initial three digits of a precinct and zip code. zip code 7 NHINNHIN AnonymizationAnonymization Guidelines Guidelines

(C) All elements of dates (except year) 2) Replace all registration data for dates directly related to an columns with fabricated data individual, including admission date, (except for: gender_code, discharge date, date of death; birth date_of_birth) date, all ages over 89 and all elements a) Offset all result dates by a of dates (including year) indicative of random such age, except that such ages and number of days (between 1 and 90) elements may be aggregated into a into the past. single category of age 90 or older a) Offset date_of_birth by random number of days between 1 and 90 into the past.

(D) Telephone numbers; 5) Replace all telephone and fax numbers with a fabricated number, for example 222-555-1111. Use the fictitious exchange code “555” in all cases.

8 NHINNHIN AnonymizationAnonymization Guidelines Guidelines

(E) Fax numbers; 5) Replace all telephone and fax numbers with a fabricated number, for example 222-555-1111. Use the fictitious exchange code “555” in all cases.

(F) Electronic mail addresses; 7) Replace all email addresses, URLs and IP addresses with fabricated data.

(G) Social security numbers; 4) Replace all Social Security Numbers, order numbers, account numbers, patient ID numbers, Medicare/Medicaid numbers, certificate or licensing numbers, etc,. with fabricated numbers. 9 NHINNHIN AnonymizationAnonymization Guidelines Guidelines

(H) Medical record numbers; 4) Replace all Social Security Numbers, order numbers, account numbers, patient ID numbers, Medicare/Medicaid numbers, certificate or licensing numbers, etc,. with fabricated numbers.

(I) Health plan beneficiary numbers; 4) Replace all Social Security Numbers, order numbers, account numbers, patient ID numbers, Medicare/Medicaid numbers, certificate or licensing numbers, etc,. with fabricated numbers.

(J) Account numbers; 4) Replace all Social Security Numbers, order numbers, account numbers, patient ID numbers, Medicare/Medicaid numbers, certificate or licensing numbers, etc,. with fabricated numbers. 10 NHINNHIN AnonymizationAnonymization Guidelines Guidelines

(K) Certificate/license numbers; 4) Replace all Social Security Numbers, order numbers, account numbers, patient ID numbers, Medicare/Medicaid numbers, certificate or licensing numbers, etc,. with fabricated numbers.

(L) Vehicle identifiers and serial 4) Replace all Social Security numbers, including license plate Numbers, order numbers, account numbers; numbers, patient ID numbers, Medicare/Medicaid numbers, certificate or licensing numbers, etc,. with fabricated numbers.

(M) Device identifiers and serial 4) Replace all Social Security numbers; Numbers, order numbers, account numbers, patient ID numbers, Medicare/Medicaid numbers, certificate or licensing numbers, etc,. with fabricated numbers. 11 NHINNHIN AnonymizationAnonymization Guidelines Guidelines

(N) Web Universal Resource Locators 7) Replace all email addresses, URLs (URLs); and IP addresses with fabricated data.

(O) Internet Protocol (IP) address 7) Replace all email addresses, URLs numbers; and IP addresses with fabricated data.

(P) Biometric identifiers, including 11) Replace any other data that can finger and voice prints; be considered part of the HIPAA 18 individual identifiers.

12 NHINNHIN AnonymizationAnonymization Guidelines Guidelines

(Q) Full face photographic images 11) Replace any other data that can and any comparable images; and be considered part of the HIPAA 18 individual identifiers.

(R) Any other unique identifying 2a) Retain gender_code. number, characteristic, or code 11) Replace any other data that can be considered part of the HIPAA 18 individual identifiers.

13 HITSPHITSP PseudonymizePseudonymize Transaction: Transaction: PatientPatient PseudoPseudo IdentifyingIdentifying InformationInformation

14 PersonPerson IdentifierIdentifier Cross-ReferenceCross-Reference (PIX)(PIX) ManagerManager QueryQuery

15 PatientPatient IdentityIdentity FeedFeed

16 StandardsStandards

• Health Insurance Portability an Accountability Act (HIPAA) • Health Level Seven (HL7) • Integrating the Healthcare Enterprise (IHE) IT Infrastructure Technical Framework (ITI-TF) • International Organization for Standardization (ISO) Health Informatics - Pseudonymization, Technical Specification # 25237

17 SummarySummary

Data de-identification systems can help accomplish organizations goals of improving quality of care, promoting research, and protecting . However, producing anonymous data that remains specific enough to be useful is often a very difficult task. Although new technology offers some good choices, technical solutions alone remain inadequate. Technology must work with policy for the most effective solutions.

18 ReferencesReferences

• ISO/IEC DTS 25237,”Pseudonymization Practices for the Protection of Personal Health Information and Health Related Services • HITSP Pseudonymize Transaction Ready for Implementation V2.1 • National Health Information Network (NHIN)

19 ContactContact InformationInformation

Sepideh Khosravifar, CISSP For Department of Veteran Affairs SAIC - Analyst IV [email protected] 858-826-5447 office

20 Questions?Questions?

21