Infotype Detector Reference | Data Loss Prevention Documentation
Total Page:16
File Type:pdf, Size:1020Kb
8/23/2020 InfoType detector reference | Data Loss Prevention Documentation InfoType detector reference Cloud Data Loss Prevention (DLP) uses information types—or infoTypes—to dene what it scans for. An infoType is a type of sensitive data, such as a name, email address, telephone number, identication number, or credit card number. Every infoType dened in Cloud DLP has a corresponding detector. Cloud DLP uses infoType detectors in the conguration for its scans to determine what to inspect for and how to transform ndings. InfoType names are also used when displaying or reporting scan results. For more in-depth information about infoType detectors, see InfoTypes and infoType detectors (/dlp/docs/concepts-infotypes). The Cloud DLP team releases new infoType detectors and groups periodically. To get the latest list of built-in infoTypes, call the infoTypes.list (/dlp/docs/reference/rest/v2/infoTypes/list) method of Cloud DLP. tant: Built-in infoType detectors are not a 100% accurate detection method. For example, they can't guarantee iance with regulatory requirements. You must decide what data is sensitive and how to best protect it. Google mends that you test your settings to make sure your conguration meets your requirements. Global InfoType Description ADVERTISING_ID Identiers used by developers to track users for advertising purposes. These include Google Play Advertising IDs, Amazon Advertising IDs, Apple's identierForAdvertising (IDFA), and Apple's identierForVendor (IDFV). AGE An age measured in months or years. CREDIT_CARD_NUMBER A credit card number is 12 to 19 digits long. They are used for payment transactions globally. CREDIT_CARD_TRACK_NUMBER A credit card track number is a variable length alphanumeric string. It is used to store key cardholder information. https://cloud.google.com/dlp/docs/infotypes-reference/ 1/18 8/23/2020 InfoType detector reference | Data Loss Prevention Documentation DATE A date. This infoType includes most date formats, including the names of common world holidays. Note: Not recommended for use during latency sensitive operations. DATE_OF_BIRTH A date of birth. Note: Not recommended for use during latency sensitive operations. DOMAIN_NAME A domain name as dened by the DNS standard. EMAIL_ADDRESS An email address identies the mailbox that emails are sent to or from. The maximum length of the domain name is 255 characters, and the maximum length of the local-part is 64 characters. ETHNIC_GROUP A person’s ethnic group. FEMALE_NAME A common female name. Note: Not recommended for use during latency sensitive operations. FIRST_NAME A rst name is dened as the rst part of a PERSON_NAME. Note: Not recommended for use during latency sensitive operations. GENDER A person’s gender identity. GENERIC_ID Alphanumeric and special character strings that may be personally identifying but do not belong to a well-dened category, such as user IDs or medical record numbers. IBAN_CODE An International Bank Account Number (IBAN) is an internationally agreed-upon method for identifying bank accounts dened by the International Standard of Organization (ISO) 13616:2007 standard. The European Committee for Banking Standards (ECBS) created ISO 13616:2007. An IBAN consists of up to 34 alphanumeric characters, including elements such as a country code or account number. HTTP_COOKIE An HTTP cookie is a standard way of storing data on a per website basis. This detector will nd headers containing these cookies. ICD9_CODE The International Classication of Diseases, Ninth Revision, Clinical Modication (ICD-9-CM) lexicon is used to assign diagnostic and procedure codes associated with inpatient, outpatient, and physician oce use in the United States. The US National Center for Health Statistics (NCHS) created the ICD-9-CM lexicon. It is based on the ICD-9 lexicon, but provides for more morbidity detail. The ICD-9-CM lexicon is updated annually on October 1. ICD10_CODE Like ICD-9-CM codes, the International Classication of Diseases, Tenth Revision, Clinical Modication (ICD-10-CM) lexicon is a series of https://cloud.google.com/dlp/docs/infotypes-reference/ 2/18 8/23/2020 InfoType detector reference | Data Loss Prevention Documentation diagnostic codes. The World Health Organization (WHO) publishes the ICD-10-CM lexicon to describe causes of morbidity and mortality. IMEI_HARDWARE_ID An International Mobile Equipment Identity (IMEI) hardware identier, used to identify mobile phones. IP_ADDRESS An Internet Protocol (IP) address (either IPv4 or IPv6). LAST_NAME A last name is dened as the last part of a PERSON_NAME. Note: Not recommended for use during latency sensitive operations. LOCATION A physical address or location. Note: Not recommended for use during latency sensitive operations. MAC_ADDRESS A media access control address (MAC address), which is an identier for a network adapter. MAC_ADDRESS_LOCAL A local media access control address (MAC address), which is an identier for a network adapter. MALE_NAME A common male name. Note: Not recommended for use during latency sensitive operations. MEDICAL_TERM Terms that commonly refer to a person's medical condition or health. Note: Not recommended for use during latency sensitive operations. ORGANIZATION_NAME A name of a chain store, business or organization. Note: Not recommended for use during latency sensitive operations. PASSPORT A passport number that matches passport numbers for the following countries: Australia, Canada, China, France, Germany, Japan, Korea, Mexico, The Netherlands, Poland, Singapore, Spain, Sweden, Taiwan, United Kingdom, and the United States. PERSON_NAME A full person name, which can include rst names, middle names or initials, and last names. Note: Not recommended for use during latency sensitive operations. PHONE_NUMBER A telephone number. STREET_ADDRESS A street address. Note: Not recommended for use during latency sensitive operations. SWIFT_CODE A SWIFT code is the same as a Bank Identier Code (BIC). It's a unique identication code for a particular bank. These codes are used when transferring money between banks, particularly for international wire transfers. Banks also use the codes for exchanging other messages. https://cloud.google.com/dlp/docs/infotypes-reference/ 3/18 8/23/2020 InfoType detector reference | Data Loss Prevention Documentation TIME A timestamp of a specic time of day. URL A Uniform Resource Locator (URL). VEHICLE_IDENTIFICATION_NUMBERA vehicle identication number (VIN) is a unique 17-digit code assigned to every on-road motor vehicle. Credentials and secrets The infoType detectors in this section detect credentials and other secret data. InfoType Description AUTH_TOKEN An authentication token is a machine-readable way of determining whether a particular request has been authorized for a user. This detector currently identies tokens that comply with OAuth or Bearer authentication. AWS_CREDENTIALS Amazon Web Services account access keys. AZURE_AUTH_TOKEN Microsoft Azure certicate credentials for application authentication. BASIC_AUTH_HEADER A basic authentication header is an HTTP header used to identify a user to a server. It is part of the HTTP specication in RFC 1945, section 11. ENCRYPTION_KEY An encryption key within conguration, code, or log text. GCP_API_KEY Google Cloud API key. An encrypted string that is used when calling Google Cloud APIs that don't need to access private user data. GCP_CREDENTIALS Google Cloud service account credentials. Credentials that can be used to authenticate with Google API client libraries and service accounts. JSON_WEB_TOKEN JSON Web Token. JSON Web Token in compact form. Represents a set of claims as a JSON object that is digitally signed using JSON Web Signature. HTTP_COOKIE An HTTP cookie is a standard way of storing data on a per website basis. This detector will nd headers containing these cookies. PASSWORD Clear text passwords in congs, code, and other text. WEAK_PASSWORD_HASHA weakly hashed password is a method of storing a password that is easy to reverse engineer. The presence of such hashes often indicate that a system's security can be improved. https://cloud.google.com/dlp/docs/infotypes-reference/ 4/18 8/23/2020 InfoType detector reference | Data Loss Prevention Documentation XSRF_TOKEN An XSRF token is an HTTP header that is commonly used to prevent cross-site scripting attacks. Cross-site scripting is a type of security vulnerability that can be exploited by malicious sites. Argentina InfoType Description ARGENTINA_DNI_NUMBERAn Argentine Documento Nacional de Identidad (DNI), or national identity card, is used as the main identity document for citizens. Australia InfoType Description AUSTRALIA_DRIVERS_LICENSE_NUMBERAn Australian driver's license number. AUSTRALIA_MEDICARE_NUMBER A 9-digit Australian Medicare account number is issued to permanent residents of Australia (except for Norfolk island). The primary purpose of this number is to prove Medicare eligibility to receive subsidized care in Australia. AUSTRALIA_PASSPORT An Australian passport number. AUSTRALIA_TAX_FILE_NUMBER An Australian tax le number (TFN) is a number issued by the Australian Tax Oce for taxpayer identication. Every taxpaying entity, such as an individual or an organization, is assigned a unique number. Belgium InfoType Description BELGIUM_NATIONAL_ID_CARD_NUMBER A 12-digit Belgian national identity card number. https://cloud.google.com/dlp/docs/infotypes-reference/ 5/18 8/23/2020 InfoType detector reference | Data Loss Prevention Documentation Brazil InfoType Description BRAZIL_CPF_NUMBERThe Brazilian Cadastro