- Platform Release 6.5
- Privacera Platform Installation
- Privacera Platform User Guide
- Privacera Discovery User Guide
- Privacera Encryption Guide
- Privacera Access Management User Guide
- AWS User Guide
- Overview of Privacera on AWS
- Configure policies for AWS services
- Using Athena with data access server
- Using DynamoDB with data access server
- Databricks access manager policy
- Accessing Kinesis with data access server
- Accessing Firehose with Data Access Server
- EMR user guide
- AWS S3 bucket encryption
- Getting started with Minio
- Plugins
- How to Get Support
- Coordinated Vulnerability Disclosure (CVD) Program of Privacera
- Shared Security Model
- Privacera Platform documentation changelog
Scan Techniques
Processing order of scan techniques
Privacera Discovery applies tags to dataset attributes using defined rules . This is done by comparing data against dictionaries and models. The application of tags depends on the order of relevant rules. After a rule is triggered, the rest of the relevant rules are not processed.
After creating rules, you can reorder them into the necessary sequence to ensure that your data is tagged appropriately. See Reorder Structured Rules for more information.
Add and scan resources in a data source
The following example enables scanning on an AWS-Aurora DB resource. It is recommended that you familiarize yourself with the names of the resources you want to enable before scanning as they will appear in a drop-down menu.
To enable scanning on an AWS resource, do the following:
From the navigation menu, select Discovery > Data Source.
From the Applications list, select AWS-Aurora DB.
Click Add to add a resource for scanning.
Type the text of the resource and it will display the list of resources that matches the text.
Select the scan type.
Click Save.
Click the Status toggle to globally enable scanning.
For real-time scan, resources will be automatically scanned when they are added to the Included Resources list.
For offline scan, click Scan Resource button to initiate a scan.
Repeat these steps as needed for other data resources or applications you intend to enable for scanning.
The names of displayed fields will be different depending on the type of resource or application you are configuring (for example, Include Resource or Include Database or Table).
Resources in the landing zones are automatically scanned by Privacera. For more information on Data Zones see Data Zones.
Google Cloud Storageand Google BigQuery
Using a single Google Cloud Storage or Google BigQuery data source, you can scan resources from multiple projects. You can search for projects to be added, and select resources from the project to be included for scanning. To retrieve the list of projects in Google Cloud Storage or Google BigQuery, configure the Google Cloud Manager API.
Note
Data Explorer does not support showing resources from multiple projects. It only shows resources for the project with which the data source is configured.
Prerequisites
To allow Privacera search for projects on your Google account, you need to enable the API services in the GCP project you registered as a data source. Refer the Google documentation to enable API services.
Add resources to Google Cloud Storage or Google BigQuery data sources
Before you can add resources to a data source, your data source must be registered and the prerequisite requirements must be met in order to continue. For more information on registering a data source, see data source registration.
To add resources to Google Cloud Storage or Google BigQuery data sources, do the following:
From the navigation menu, select Discovery > Data Source.
From the Applications section, select a Google Cloud Storage or Google BigQuery data source.
Click Add.
In the Add Resource dialog, enter the following:
Enter the Project ID of the resource you want to scan. You can enter an asterisk (*) to get a list of projects.
For Google BigQuery, the Project ID will be appended to the dataset or table name.
For Google Cloud Storage, the Project ID will not append to the bucket name as they are unique across a project.
Enter the Resource you are including in the project.
Note
Resources can be added from multiple projects. Existing resources will be updated with a project ID. If you have resources in a specific directory, you can add this location path so that all of the databases/tables in that location are scanned.
For Google Cloud Storage, add the bucket resources.
For Google BigQuery, add the datasets or tables.
Select a scan type:
Scan: Select this option if you want to perform real-time/offline scan.
Incremental: Select this option if you want to scan the resource once. During a re-scan, the resource gets added in the Excluded Resources list.
Multi-input: Turn on this button if you want to switch to a multiple input view and add multiple resources, one per line.
Click Save.
To enable the real-time/offline scan for the Google Cloud Storage or Google BigQuery data source, click the Status toggle.
Start or cancel a scan
There are several ways to start scans in Privacera Discovery:
From the Data Source page, which is described here.
For offline (re-scan) or realtime (continuous) scans, as detailed in Start Types of Scans.
If you have set up datazones, starting a scan, called reevaluation, is discussed in Data Zones.
Start a scan from the Data Source page
To start a scan from the Data Source page, follow these steps:
From the Applications section, select the application that contains the resource you want to scan.
In the Scanning Details section, locate the resource you want to scan.
Click SCAN RESOURCE.
A message appears indicating that a scan has been initiated.
Cancel a scan
To cancel a scan, follow these steps:
Go the Scan Status page.
Locate the scan that you want to cancel.
Click Cancel.
Tags
Tags are an important part of Privacera Discovery and access control. In addition to security policies for resources and roles, you can create policies based on tags. Using tag-based policies, you can manage access to sensitive data regardless of where the data is stored.
Privacera Discovery scans data sources and tags all sensitive information across the enterprise. Example tags include PERSON_NAME
, PII, ADDR
, or EMAIL_ADDR
. A dataset attribute, such as a column, table, or file, can be tagged with metadata information that can be used to classify the data asset. For example, a column titled "Email" or "Phone_Number" can be tagged as PII
.
Tags enrich existing information about your data. Data administrators can create access control policies based on the tags created by Privacera Discovery. You can view your tags from the Tags Information page.
If you have defined rules, the generation of tags depends on the order of the rules. For more information, see Processing order of scan techniques and Reorder structured rules.
Add Tags
You can add tags in Privacera Discovery from the Tags Information page.
To add a tag, folow these steps:
From the Privacera home page, expand the Discovery menu and select Tags Information.
Click the + icon.
The Add Tag dialog is displayed.
In the Tag Name field, enter a name for the tag.
In the Description field, enter a description of the tag (optional).
Click Save.
The tag is added.
Edit Tag Descriptions
You can edit the descriptions of tags in Privacera Discovery from the Tags Information page.
Note
You cannot change a tag name after the tag is created.
To edit the description of a tag, follow these steps:
In the Tags Information page, select the tag you want to edit from the Tags list and click Edit.
The Edit Tag dialog is displayed.
Update the Description field.
Click Save.
The tag is updated.
Delete Tags
You can delete tags in Privacera Discovery from the Tags Information page.
To delete a tag, follow these steps:
In the Tags Information page, select the tag you want to edit from the Tags list and click Delete.
The following message is displayed: “Are you sure you want to delete this tag?”
Click Yes to delete the tag or No to return to the Tags Information page.
Search for Tags
You can search for tags in Privacera Discovery from the Tags Information page.
To search for a tag, enter the name of the tag into the Search Tag field.
Add, Edit, or Delete Tag Attributes
The Attributes section displays a list of attributes associated with a tag. You can search the list of attributes using the search box. The Attributes section also displays the total number of records with this tag.
To add an attribute for a specific tag, follow these steps:
In the Tags Information page, select the tag from the Tags list.
Click Add Attribute
The Add Attribute dialog displays.
In the Name field, enter the name of the attribute.
In the Value field, enter the value of the attribute.
Click Save.
The attribute is added to the selected tag.
Note
You can delete or edit the attribute from the Actions column.
Export Tags
To export the tag file in JSON format, follow these steps:
Click Export.
Check the checkbox of the required tag and click the Export. You can select multiple tags.
The tag file is exported.
Import Tags
To import a tag file in JSON format, follow these steps:
Click the Import icon.
The Import dialog displays.
Select the JSON file you want to export.
Click Save.
The tag file is imported.
Fetch AWS S3 Tags
Privacera Discovery allows you to fetch AWS S3 tags. There are two types of tags that can be fetched:
Object Tags: Tags associated with the AWS S3 object or files in buckets.
Bucket Tags: Tags associated with the S3 bucket.
To fetch AWS S3 tags, follow these steps:
Navigate to Discovery > Tags Information and create a tag named
AWS_S3_TAG
.Navigate to Settings > Data Source Registration and add or update the application properties as below:
Set
"Fetch S3 Object Tags": true
Set
"Fetch S3 Bucket Tags": true
Note
By default these properties are disabled and set to false.
Go to Data Inventory > Classifications and click
AWS_S3_TAG
under the Tag column, then click on View attributes link.Click View attributes .
AWS S3 tags will be displayed in the Data Info grid.
Note
If the
AWS_S3_TAG
tag is not created, then AWS S3 tags will not be fetched and the tag will not be displayed in Classification page.If both the Object and Bucket tags are enabled and have a common tag, then the Object tag will override the Bucket tag. For example: If the Bucket tag is
owner=user1
and the Object tag is owner=user2, then theAWS_S3_TAG
tag will haveowner=user2
as its attribute.Tags fetched from AWS S3 will be added as attributes of the
AWS_S3_TAG
. This tag with attributes will be synced to Apache Ranger. Verify using the following URL:https://<EC2_Instance_IP>:6182/service/tags/tags
.
Dictionaries
Dictionaries are lists of values used to identify data elements. Privacera Discovery matches dictionaries against your resources and data and can be applied to either content or metanames.
Example dictionaries include:
A dictionary of US person names used to identify names in a database.
A dictionary of common column name patterns used to identify a column of account IDs.
Dictionaries support multiple include/exclude patterns. This helps enable a longer transition from conventional patterns for pattern matching. For example, the 'email' conventional pattern and its associated structured and unstructured rules can be disabled and the same pattern value can be added as part of a new dictionary lookup. The resulting rules can then be configured just as conventional patterns.
Types of dictionaries
There are three types of dictionaries in Privacera Discovery:
Exact match: the value of the data must exactly match the value in the dictionary.
Fuzzy match: the matching is based on fuzzy logic instead of exact match.
Pattern match: the values in the dictionary are regular expressions.
Dictionary Keys
The key is used by Discovery rules to associate a tag with a resource element. Because a dictionary can be applied to either content or metaname, a naming convention is used for the key:
Content dictionary: LOOKUP suffix.
Metaname dictionary: KEYWORD suffix.
Manage dictionaries
Privacera Discovery comes pre-loaded with a set of useful dictionaries. You can also create your own custom dictionaries and configure rules to use them.
The values in a dictionary can come from a text file that can be uploaded through the portal or directly copied into your installation. For smaller dictionaries, you can add values using the Privacera portal either one by one or with the bulk input interface. For dictionaries that are file-based, you can add additional values or exclude existing values using the Privacera portal.
When a dictionary is created or modified, the updated dictionary becomes available for use within a few minutes.
Add a dictionary
To add a dictionary, follow these steps:
On the Dictionaries page, click the + sign.
The Add Dictionary dialog is displayed
Enter the following details:
The Name of the dictionary (required)
The Description of the dictionary.
The Key field is not editable because it is populated by the system. You have the option to add IPv4 and IPv6 address regexes as an option under Key description for regexes and used to lookup dictionary content.
The required File name.
Select the required Type: Exact, Pattern, or Fuzzy match.
Note
For pattern dictionaries, see Pattern Validation.
Select Apply For. The choices are content or metaname. If you select metaname, for pattern type dictionaries, you have the choice to apply the input tags directly to the resource. See Add Meta Tags Directly to Dictionary.
Select the Status (enabled by default).
Click Save.
The dictionary is added.
Add meta tags directly to a dictionary
When you create a new dictionary of type pattern, you can apply meta tags directly to a data source. The option appears after you select the combination of pattern and metaname.
Upload a dictionary
To upload a dictionary, follow these steps:
In the Dictionaries page, click Upload Dictionary.
The Upload Dictionary dialog is displayed.
Select the .txt file of the dictionary you want to upload.
Click Save.
The dictionary file is uploaded.
Edit a dictionary
To edit a dictionary, follow these steps:
In the Dictionaries page, select a dictionary from the dictionary list and click Edit.
The Edit Dictionary Info dialog is displayed.
Update the required fields.
Click Save.
The dictionary is updated.
Copy a dictionary
To make a copy of a dictionary, follow these steps:
On the Dictionary page, select a dictionary from the dictionary list and click Create Copy.
The Copy Dictionary Info dialog is displayed with selected Type and Apply For** values.
Enter the following details:
Enter the Name dictionary (required).
Enter the Description of dictionary.
Enter the File name (required).
Select the Type (required).
Select the Apply For (required).
Select the Status (enabled by default)
Click Save.
A copy of the dictionary is created.
Enable or disable a dictionary
To enable or disable a dictionary, follow these steps:
On the Dictionaries page, select a dictionary from the Dictionary list
Click the Status toggle to enable or disable the dictionary.
Search for a dictionary
To search for a dictionary, navigate to the Dictionaries page and enter the dictionary name into the search bar.
Dictionary tour
To see an explanation of the different components of a dictionary, click Tour on the Dictionaries page.
Include a Dictionary
You can filter the list of included dictionaries using the search included dictionary option. This tab also displays the current count of records relying on the dictionary.
The Include Dictionary tab displays the following:
Name: Name of the dictionary.
Description: The lookup/keyword description.
Actions: Edit or delete dictionaries.
Bulk Edit/Delete: Select this to edit or delete the dictionary values in bulk. After selecting, click x to delete the values.
Add keywords to an included dictionary
To add a keyword or lookup under Include Dictionary, follow these steps:
On the Dictionaries page, select a dictionary from the dictionary list.
In the Include Dictionary tab, click ADD.
The Add Dictionary dialog is displayed.
Enter the name of the keyword or lookup, one name per line.
Add a Description for the dictionary name.
Click Save.
The keyword or lookup is added to the selected dictionary in the Include Dictionary tab.
Exclude a dictionary
You can filter the list of excluded dictionaries using the search excluded dictionary option. This tab also displays the total record count.
The Exclude Dictionary tab displays the following information:
Name: Indicates name of the dictionary.
Actions: Allows you to edit and delete the dictionary.
To add a lookup in the Exclude Dictionary tab, follow these steps:
On the Dictionaries page, select a dictionary from the Dictionary list.
Select the Exclude Dictionary tab and click +Add.
The Add Dictionary dialog displays.
In the Name field, enter the names of the dictionaries, one name per line.
In the Description field, enter a description for the dictionary.
Click Save.
The lookup is added to the selected dictionary.
Import a dictionary
To import a dictionary in JSON format, follow these steps:
On the Discovery page, click Import.
The Import dialog is displayed.
Select the JSON file of the dictionary you want to import and click Save.
The dictionary configuration file is imported.
Export a dictionary
To export a dictionary in JSON format, follow these steps:
On the Dictionaries page, click Export .
Check the checkbox of the required dictionary and click Export.
Note
You can select multiple dictionaries.
The dictionary file is exported.
Test dictionaries
Pattern validation
If the dictionary is of type pattern, you can validate its regexes.
To validate a pattern, follow these steps:
In the Dictionaries page, add a new dictionary of type 'Patterns'.
The Add Dictionary field for the pattern type is displayed.
Enter a complex Expression (regex).
Enter the Description for the expression.
Enter the Input Test Data.
Click Test Expression.
The message "Passed" or "Failed" appears in the Test Output field.
Test against a data source
To test changes to a dictionary, follow these steps:
Perform an offline scan of the data source that has sensitive fields you want to test.
Check the Scan Status.
After the scan is completed, open the resource to verify if the scan classified the tags correctly.
The tags are classified under Data Inventory > Classification.
List of Privacera-supplied dictionaries
The following is a list of the Privacera-supplied dictionaries. The name of a dictionary in general describes the purpose of the dictionary. For precise details, look at the dictionary itself in the Platform UI.
AU_BSB_LOOKUP
BINARY_MIME_KEYWORD
CC_KEYWORD
CC_PROTECTED_KEYWORDDisabled
CITY_KEYWORD
COUNTY_KEYWORD
CRIMINAL_RECORD_LOOKUP
DISALLOW_DOB_KEYWORDDisabled
DISALLOW_NAME_KEYWORDDisabled
DISALLOW_ZIP_KEYWORDDisabled
DOB_KEYWORD
ETHNICITY_LOOKUP
EXEC_MIME_KEYWORD
GEO_KEYWORD
GPS_KEYWORD
IMAGE_MIME_KEYWORD
ISO3166_CC_LOOKUP
MEDICAL_RECORD_LOOKUP
ORG_LOOKUP
PASSPORT_KEYWORD
PASSWORD_KEYWORD
PERSON_NAME_KEYWORD
PERSON_NAME_LOOKUP
PII_ID_KEYWORD
SSN_KEYWORD
STATE_KEYWORD
SWIFT_BIC_KEYWORDDisabled
SWIFT_BIC_LOOKUPDisabled
TAX_ID_KEYWORD
UK_ELECTORAL_ROLL_KEYWORDDisabled
UK_NHS_KEYWORDDisabled
UK_NINO_KEYWORDDisabled
UK_POSTAL_TOWN_LOOKUPDisabled
US_ABA_NUMBER_KEYWORDDisabled
US_ADDRESS_KEYWORD
US_CITY_KEYWORD
US_CITY_LOOKUP
US_COUNTY_KEYWORDDisabled
US_COUNTY_LOOKUPDisabled
US_DLICENSE_KEYWORD
US_DLICENSE_LOOKUP
US_STATE_KEYWORD
US_STATE_LOOKUP
US_ZIP_KEYWORD
US_ZIP_LOOKUP
Patterns
Patterns are deprecated
Patterns are deprecated, embed patterns in Dictionaries instead.
Note
In a future release Discovery patterns will be removed from the left nav, because they are not used frequently. Instead, customers should now embed patterns in dictionaries. If you have any patterns in use, you should move them to dictionaries now.
Patterns are regular expressions (regexes) that match specific data elements in your data resources.
Privacera-supplied regular expressions can match common patterns like email addresses and URLs.
You can also define your own regexes to isolate patterns in your data to augment Privacera's patterns.
Add patterns
To add a pattern, do the following:
From the navigation menu, select Discovery > Patterns.
Click Add Pattern.
The Add Pattern dialog is displayed.
In the Pattern Name field, enter a pattern name.
From the Applied On dropdown menu, select one of the following options:
All: Pattern matching is applied at the file level (default).
File Content: Pattern matching is applied to the content of the file.
File Name: Pattern matching is applied based on file name.
Table/Column Name: Pattern matching is applied based on table or column name.
Using the Regex Status toggle, enable (default) or disable regexes.
In the Expression field, enter an expression. For example, an expression for a bank account number might be
b(d{9}\|d{12})b
.In the Description field, enter a description.
In the Input Test Data field, enter your test data.
Click Test Expression to verify the expression you entered into the Input Test Data field.
The test result is displayed in Test Output field.
Click Save.
The pattern is added.
Edit patterns
To edit a pattern, do the following:
On the Patterns page, locate the pattern you want to edit and click the Edit icon in the Actions column.
Update the required fields.
Click Save.
The pattern is updated.
Delete patterns
To delete a pattern, do the following:
On the Patterns page, locate the pattern you want to delete and click the Delete icon in the Actions column.
You are prompted with a message to confirm the deletion.
Click Yes to delete the pattern.
The pattern is deleted.
Search for patterns
To search for a pattern, enter the pattern name in the search bar on the Patterns page and click Enter.
The search results are displayed.
Export JSON pattern files
To export patterns to a file in JSON format, do the following:
On the Patterns page, click Export.
Select the patterns you want to export and click Export.
The pattern file is exported.
Import JSON pattern files
To import a pattern file in JSON format, do the following:
On the Patterns page, click Import.
Select the JSON file you want to import and click Save.
The pattern file is imported.
List of Privacera-supplied patterns
The following is a list of the Privacera-supplied patterns. You can view details about each of the patterns from the Patterns page.
ACCOUNT
CREDIT_CARD
EMAIL
FINANCIAL
IPV4
IPV6
MAC_ADDRESS
STREET_ADDRESS
UK_DRIVER_LICENSE
UK_ELECTORAL
UK_NINO
UK_POSTAL_CODE
UK_US_PASSPORT
URL
ZIPCODE
Scan status
After you trigger a manual scan, you can check the progress of the scan from the Scan Status page.
During manual or offline rescan, if a file under a specified directory does not exist, the scan marks that the data was deleted in Classification. This is applicable only when realtime scan is disabled. The deleted resources are stale and can be viewed under Stale Resources.
Scan IDs that have not resulted in any tag classification are periodically removed from the status page.
Status summaries
To check the status of your scans, select Discovery >Scan Status from the navigation menu.
Scans can have the following statuses:
Pending: Number of scan requests in pending state.
Listing: Number of scan requests in listing state.
Running: Number of scan requests in running state.
Success Number of successfully completed scan requests.
Failed: Number of failed scan requests.
Killed: Number of killed scan requests.
Cancelled: Number of cancelled scan requests.
Retry: Number of scan requests moved into retry state.
Note
Scanning durations are shown for data in different stages. For example, Listing shows the time taken to scan the existing data.
List of individual scans
The Scan Status page displays a table of individual scans that includes the following information:
Scan Id: The scan ID with a clickable link to view a summary of the scan.
The Scan Type is shown as Scan (which is a full scan) or Incremental.
Status: The status of the scan request.
Scan/Total Resource: The number of files or tables scanned out of the total number present in the scan request.
Application: The name of the application, such as Hadoop-Hive or Azure-ADLS.
Resource: The name of resource. Click the resource to view the classification page for that resource.
Create Time: The date and time that the scan was triggered.
Start Time: The start time of the scan.
End Time: The end time of scan.
Duration: The scan duration.
Request User: The name of the user who triggered the scan.
Type: The type of scan, such as offline or realtime scan.
Policy: The name of the policy.
View Individual Summary Reports
Click View Summary Report in the Scan Id column to view Summary Info details for the selected scan ID such as Tagged Resources, UnTagged Resources, Excluded, Failed Resources, Properties, Diagnostic Info, Logs, Scan Cleanup, and Stale Resources.
Export Scan Summary
To download the scan summary, click Export and follow the leading prompts.
Data zone movement
To view a summary of data zone movement, select Compliance Workflow > Data Zone Movement from the navigation menu.
View undefined data zone movements
On the Data Zone Movement page, click Show Undefined Zone Movements to view undefined zone movements.
Filter data zone movements
You can filter the data zone list using the Filter Data Zone option. You can also filter data zone movements by date range, including:
Today
Yesterday
Last 30 Days
This Month
Last Month
Custom Range
Note
By default, the date range is set to Last 7 Days.
Click Refresh to refresh the list of data zones.
Models
Models detect specific data elements in your data resources. The detection is done with various algorithms and heuristics.
Types of models
Privacera supports different types of models. You can filter the list of models using the search model option. This tab also displays the present number of record count.
Generic models
These are various general model parameters you can use to tailor matching of data.
Parameter | Data Type | Default | Description |
---|---|---|---|
| String | None | Patterns to be matched. Can contain more than one pattern by changing the value of the |
| String | None | Patterns to be excluded from matching. Can contain more than one pattern by changing the value of the |
| Boolean | FALSE | Indicates whether matching should use only the digits. Setting this parameter TRUE removes all non-numeric characters in the string before matching. For example, |
| String | None | Indicates whether to evaluate a checksum digit based on the last digit. Valid values:
|
| Boolean | FALSE | Indicates whether to use patterns specified by the |
| String | None | A dictionary name or key. See Dictionaries. |
| String | None | Pattern for matching. See Patterns. NoteSee Embed Patterns in Dictionaries. |
| Boolean | FALSE | Indicates whether to use Privacera-defined matching to validate an ISO two-character country code. If this parameter is set to TRUE, |
| None | A valid pattern for matching country codes. See Patterns. NoteSee Embed Patterns in Dictionaries. | |
| None | Name of a defined dictionary. See Dictionaries. |
Credit card model
The credit card model detects credit card numbers. It validates numbers based on the issuing network, length, and Luhn checksum.
Parameter | Type | Default | Meaning |
---|---|---|---|
| String | Privacera-supplied pattern for credit card numbers with range of digits, space or hyphen separated. | Credit card pattern, if you want to override the supplied pattern. |
| Boolean | True | Validate against known issuing network prefixes. |
| Boolean | True | Validate the Luhn checksum on the credit card number. |
Supported credit card types
Credit Card Type | Conditions | Examples |
---|---|---|
American Express (AMEX) Card | Credit card starting with 34 or 37 and having 15 digits. | 34xxxxxxxxxxxxx 37xxxxxxxxxxxxx |
Master Card |
| 51xxxxxxxxxxxx 2221xxxxxxxx 27xxxxxxxxxxx |
Visa Card | Credit card starting with 4 and having 13 Or 16 digits. | 4xxxxxxxxxxxx 4xxxxxxxxxxxxxxx |
Diners Club Card | Credit card starting with 300 to 305 or 3095 or 36 or 38 or 39 and having 14 digits. | 300xxxxxxxxxxx 3095xxxxxxxxxx |
VPay (Visa) Card | Credit card starting with 4 and having 13 or 19 digits. | 4xxxxxxxxxxxx 4xxxxxxxxxxxxxxxxxx |
Date of birth model
The Date of Birth model detects various date formats.
Parameter | Type | Default | Meaning |
---|---|---|---|
| Integer | 5 | Age lower threshold. |
| Integer | 100 | Age upper threshold. |
| Boolean | True | Tagging is done based on an algorithm to detect random distribution. |
| String | – | Pattern that matches a custom date format var1. |
| String | – | Date Format that matches the pattern for var1. |
Pre-configured date formats are:
International YYYYMD format with 4 digit year
US MDY with 4 digit or 2 digit year
Month abbreviated MDY
Additional formats can be configured. For example, configure a regex and a Java date format:
Parameter | Type |
---|---|
|
|
|
|
EIN model
The EIN model detects Employer Identification Number using patterns and digit validation.
Parameter | Type | Default | Meaning |
---|---|---|---|
| String | Default | EIN digit pattern if you want to override the default pattern. |
| Boolean | True | Age upper threshold. |
| Boolean | True | Allow match only if EIN has exact format. |
Geo latitude and longitude model
The geo model detects latitude and longitude coordinates. It can validate these values based on a geographical area.
Parameter | Type | Default | Meaning |
---|---|---|---|
| Double | US min latitude | Lower limit (southern) on latitude. |
| Double | US max latitude | Upper limit (northern) on latitude. |
| Double | US min longitude | Lower limit (west) on longitude. |
| Double | US max longitude | Upper limit (east) on longitude. |
| Integer | 3 | Minimum number of digits after the decimal point. |
IMEI model
The IMEI model detects International Mobile Equipment Identity numbers that are used to identify mobile phones. It validates the Luhn checksum and the length of the IMEI.
ITIN model
The ITIN model detects Individual Tax Identifier Numbers (identifiers of individual taxpayers). It validates the format and digits of the ITIN.
Parameter | Type | Default | Meaning |
---|---|---|---|
| String | Default | ITIN digit pattern if you want to override the default pattern. |
| Boolean | True | Allow match only if ITIN has exact format. |
MIME model
The MIME model detects a file based on its Multipurpose Internet Mail Extensions type. The MIME type is detected using a combination of file extension and magic bytes in the header of the file. The detected MIME type is then looked up in a dictionary of MIME types.
Parameter | Type | Default | Meaning |
---|---|---|---|
| String | – | Identifier of dictionary of MIME types. |
There are two pre-configured MIME model instances.
For detecting executable files:
LOOKUP_DICT=EXEC_MIME_KEYWORD
.For detecting image files:
LOOKUP_DICT=IMAGE_MIME_KEYWORD
.
Phone number model
The Phone Number model detects phone numbers. It validates the format of the phone numbers based on the country for which it is configured.
Parameter | Type | Default | Meaning |
---|---|---|---|
| String | US | Two-character country code. |
SSN model
The SSN model detects US Social Security Numbers. It validates the format and checks against a blacklist of SSN numbers.
Parameter | Type | Default | Meaning |
---|---|---|---|
| String | Default | Override the default SSN pattern. |
| Boolean | True | Validate against known blacklist of SSNs. |
| Boolean | False | Allow match only if SSN has exact format. |
| Boolean | False | Match against any nine digit number without format. |
| Boolean | False | Match against any four digit number without format. Disables validation with blacklist of SSN. |
| Boolean | True | Allow match only if SSN has exact format that is hyphen-, dot-, or space-separated. |
Examples of Invalid SSNs
The SSN model would determine that the following SSNs are invalid.
SSN starting with 9 or 666 or 000 or 98765432.
SSN with 00 as the 4th and 5th digits.
SSN with 0000 as the sixth through ninth digits.
Any SSN like these:
123456789
111111111
222222222
333333333
444444444
555555555
666666666
777777777
888888888
999999999
VIN model
The VIN model detects Vehicle Identification Numbers. It validates the length and the VIN checksum.
Zip model
The Zip model detects US Zip codes. It detects both 5 digit and 5+4 digit variations and validates against a dictionary of US Zip codes.
Parameter | Type | Default | Meaning |
---|---|---|---|
| String |
| Key of the US Zip dictionary. |
| String | Default | Validates content regular expression for list of ZIP codes. |
| Boolean | False | Allow match only if Zip code has exact format. If set to true then only nine digits containing '-' and starting with five digits are considered a Zip code. |
Create a model
To create a model, follow these steps:
From the navigation menu, select Discovery > Models.
Click Add Model.
The Add Model dialog is displayed.
In the Name field, enter a name for the model.
In the Description field, enter a description of the model.
In the Key field, enter a model key.
From the Type dropdown menu, select a model type.
Note
See Types of Models for more information.
From the Apply For dropdown menu, select File content.
Note
File content is resource content.
Enable or disable the model using the Model Status toggle.
Add model properties by clicking +.
Enter a key and value into the Key and Value field. For example: Key: MIN_FRACTIONAL_DIGITS, Value: 2. You can add multiple model properties.
Note
For example: Key:
MIN_FRACTIONAL_DIGITS
, Value: 2. You can add multiple model properties.Click Save.
The model is created.
Edit a model
You can edit a model by clicking the Edit icon in the Actions column.
To edit a model, follow these steps:
Click the Edit icon in the Actions column.
The Edit Model dialog displays.
Make your desired changes.
Click Save.
The model is updated.
Delete a model
You can edit a model by clicking the Delete icon in the Actions column.
To delete a model, follow these steps:
Click the Delete icon in the Actions column.
The Confirm Delete dialog displays.
Select Delete to confirm the deletion.
The model is deleted.
Import a model
To import a model file in JSON format, follow these steps:
In the Models home page, click the Import option.
The Import dialog is displayed.
Browse and select the JSON file and click Import.
The model file is imported.
Export a model
To export a model file in JSON format, follow these steps:
In the Models page, click Export.
From the drop-down menu, select one of the following options:
All Records: Export the entire set of models.
Select Records: Select the specific model to export. You can select multiple models.
Click Export.
The JSON file is exported.
List of Privacera-supplied models
The following is a list of the Privacera-supplied models. For precise details, look at the model itself in the Platform UI.
DOB_ML_MODEL
CC_ML_MODEL
ZIP_ML_MODEL
IMEI_ML_MODEL
SSN_ML_MODEL
EXEC_ML_MODEL
MIME_ML_MODEL
PHONE_NUMBER_ML_MODEL
GEO_LAT_LONG_ML_MODEL
CC_ML_MODEL_PROTECTED
EIN_ML_MODEL
ITIN_ML_MODEL
VIN_ML_MODEL
SSN_9_DIGIT_ML_MODEL
SSN_4_DIGIT_ML_MODEL
IMAGE_FILE_ML_MODEL
IMAGE_ML_MODEL
Disallowed Tags policy
This policy helps to monitor and raises an alert if any PII tags are identified. You can add multiple tags by clicking enter after each value.
The Disallowed Tags policy has the following fields:
Name: The name of the Disallowed Movement policy.
Type: The type of policy.
Alert Level : The alert level: high, medium, or low.
Description: The description of the Disallowed Movement policy.
Disallowed Tags: Allows you to add multiple tags to be disallowed.
Add Disallowed Tags policy
If you are creating Disallowed Movement and Disallowed Tags policies, then you can capture data zone movement using Spark. Data Zone movement can be captured in HDFS to S3.
To capture Data Zone movement using Spark, follow these steps:
Note
These data zones are examples. You should create your own.
Create directories in HDFS and add the file in one of the HDFS locations:
hdfs dfs -mkdir /colour/purple hdfs dfs -mkdir /colour/pink hdfs dfs -put /finance_us.csv /colour/purple/
Add both the created directories in Include resource of HDFS.
Create two Data Zones and add the two folders in those two Data Zones' Resources.
SourceDz: It should have resource e.g. /colour/purple/ and also the Data Zone tag.
DestinationDz: It should have resource e.g. /colour/pink/ and also the policies configured for disallowed movement and disallowed tags.
Set the Application property as follows:
Generate Alert All Part Files = false
Note
If you set Generate Alert All Part Files to false, the system generates an alert for the first two part files. If you set this property to true, the system generates an alert for all part files.
Go to the terminal and log into Spark shell as follows:
spark-shell --packages com.databricks:spark-csv_2.10:1.5.0 scala> val df = sqlContext.read.format("com.databricks.spark.csv").option("header", "true").load("/colour/purple/finance_us.csv") scala> df.coalesce(1).write.mode ("overwrite").format("com.databricks.spark.csv").option("header", "true").save("/colour/pink/finance_us_11") scala> df.repartition(4).write.mode ("overwrite").format("com.databricks.spark.csv").option("header", "true").save("/colour/pink/finance_us_100")
The following output is displayed:
Kafka Topics: Check the Kafka topics audit consumption for Alerts and Lineage.
Alerts Details: Check the Alerts Details tab on the resource details for this resource.
Lineage: Check the Lineage for this resource.
Alerts Generated for part file : Check the Data Zone Graph for the alerts generation for the part files in DestinationDz.
Rules
You can create and manage custom and system-provided rules in Privacera Discovery. By executing the conditions in each rule, Discovery applies classifications to your data. The output tag associated with the processed rule is applied to the resource as the final tag.
The generation of tags depends on the order of the rules. See Processing Order of Scan Techniques and Reorder Structured Rules.
You can also create rule mappings.
Types of rules
There are three types of rules in Privacera Discovery:
Structured
Unstructured
Post-processing
Example rules and classifications
Based on the tags found in a structured or unstructured rule or a table in various columns, we can assign a tag to the file or the table. This is an AND conditions of output tags. For example, you can set multiple rules as follows:
If a file has
PERSON_NAME
ANDEMAIL
ANDSSN
, tag asPII
.If a file has
USER_ID
ANDGEO
, tag asSENSITIVE
.If a file has
USER_ID
ANDIP
, tag asSENSITIVE
.
Create a structured rule
To create a structured rule, follow these steps:
From the navigation menu, select Discovery > Rules.
On the Rules page, click Structured > Create Rule.
The Create Rule dialog is displayed.
In the Create Rule dialog, enter the following details:
Name: The name of the rule.
Description: A description of the rule (optional).
Must Have: From the dropdown menu, select dictionaries, patterns, or models to be included in the rule.
Must Not Have: From the dropdown menu, select dictionaries, patterns, or models to be included in the rule.
Score Type: From the dropdown menu, select one of the following options:
Auto: If the rule is applied, the resource is classified as System.
Review: If the rule is applied, the resource is classified as Pending Review.
Output Tags: The tags associated with the rule.
Key For Samples: The keys from the objects in the Must Have dropdown menu.
Enable rule: The rule is enabled or disabled.
Review the information in Rule preview section.
Click Save.
The structured rule is created.
Reorder structured rules
Rule order decides the priority of the rules applied during classification.
To reorder rules, follow these steps:
On the Rules page, click Reorder.
Drag the rules up or down to change the order.
Click Save Order.
The new order is saved.
Create an unstructured rule
To create an unstructured rule, follow these steps:
From the navigation menu, select Discovery > Rules.
On the Rules page, click Unstructured > Create Rule.
The Create Rule dialog is displayed.
Enter the following details:
Rule Name: Name of the rule.
Description: Description of the rule (optional).
Must Have: From the dropdown menu, select dictionaries, patterns, or models to be included in the rule.
Must Not Have: From the dropdown menu, select dictionaries, patterns, or models to be excluded from the rule (optional).
Word Proximity: Name of a pattern to identify sensitive information within the specified number of words.
Key order strict: Using the toggle, indicate whether key order is strictly followed.
Enable rule: Using the toggle, enable or disable the rule.
Review the information in the Rule preview section.
Click Save.
The unstructured rule is created.
Create a rule mapping
To create a rule mapping, follow these steps:
From the navigation menu, select Discovery > Rules.
On the Rules page, click Rule Mapping > Add Mapping.
The Add Key Tag Mapping dialog is displayed.
From the Key dropdown menu, select a dictionary, pattern, or model.
From the Tag dropdown, select a tag.
Note
You can add multiple keys and tags by clicking +.
Click Save
The rule mapping is created.
Export rules and mappings
To export a rule file in JSON format for a structured rule, follow these steps:
From the navigation menu, select Discovery > Rules.
Click Export.
Select the files you wish to export.
Click Export
The rule file is exported.
Import rules and mappings
To import a JSON rule file for a structured rule, follow these steps:
From the navigation menu, select Discovery > Rules.
On the Rules page, click Import.
The Import dialog is displayed.
Click Choose File and select the JSON file.
Note
Selecting Clean Previous deletes all existing rules.
Click Save.
The rule file is imported.
Post-processing in real-time and offline scans
With post-processing, the data is scanned and then the rules are applied on the tagged data in multiple passes. Post-processing can be used with both real-time and offline scans. Based on the output tags of the rules applied after the initial scan, with post-processing you can add additional tags on the parent or child data resources.
Post-processing rules should be applied after datazone and tag propagation is done.
For example, after the initial scan of a structured or unstructured file or columns within a table, Privacera Discovery will identify the data and classify them with tags based on the rules. After the initial scan has tagged various columns within a table or a file, you can use post-processing rules to assign additional tags to the file or the parent table.
Enable post-processing
To enable post-processing, follow these steps:
Navigate to Setting > System Configuration.
Search for the property
privacera.portal.rules.post_process.enable=false
.Note
The default setting is false.
Set the property to true.
Example of post-processing rules on tags
From the navigation menu, select Discovery > Rules.
On the Rules page, select Post-Processing.
Create a new rule with the following condition: If
PERSON_NAME
andSSN
are found, apply theSENSITIVE
tag.Rescan the file to apply the post-processing rules.
The fields are now classified as
SENSITIVE
and the tag is applied in the unformatted view.
List of structured rules
The following is a list of the Privacera-supplied structured rules. For more information about any, look at the pattern itself in the Platform UI.
Australia Bank Account Number
Australia Bank BSB code
Australia Driver License
IBAN Rule
rule_auto_1P
rule_auto_2P
rule_auto_3P
rule_auto_4P
rule_auto_5M
rule_auto_6M
rule_auto_7M
rule_auto_8M
rule_auto_9M
rule_biometric
rule_biometric_keyword
rule_cc
rule_city_name
rule_criminal_keyword
rule_dob
rule_email
rule_ethnicity_keyword
rule_gps
rule_gps_6_digit
rule_medical_keyword
rule_national_id
rule_password
rule_person_name
rule_phonenumber
rule_pii_id_keyword
rule_political_keyword
rule_religion_keyword
rule_sexual_orientation_keyword
rule_ssn_4_digit
rule_ssn_9_digit
rule_ssn_strict
rule_ssn_strict_fallback
rule_state_name
rule_street_address
rule_tax_id_9_digit
rule_tax_id_strict
rule_trade_union_keyword
Rule US ABA Routing Number
Rule US ABA Routing Number 2
rule_us_dlicense_keyword
rule_us_zip
rule_viewership_keyword
rule_web_keyword
SWIFT BIC Bank ID rule
SWIFT BIC Bank ID Rule 2
UK Driver License Rule
UK Electoral Roll number
UK NHS Rule
UK NHS Rule 2
UK NINO Rule
UK NINO RULE 2
UK Phone Number Rule
UK Postal Code
UK Postal Town
UK US Passport