Skip to main content

Privacera Platform

Scan Techniques

:

Processing order of scan techniques

Privacera Discovery applies tags to dataset attributes using defined rules . This is done by comparing data against dictionaries and models. The application of tags depends on the order of relevant rules. After a rule is triggered, the rest of the relevant rules are not processed.

After creating rules, you can reorder them into the necessary sequence to ensure that your data is tagged appropriately. See Reorder Structured Rules for more information.

Add and scan resources in a data source

The following example enables scanning on an AWS-Aurora DB resource. It is recommended that you familiarize yourself with the names of the resources you want to enable before scanning as they will appear in a drop-down menu.

To enable scanning on an AWS resource, do the following:

  1. From the navigation menu, select Discovery > Data Source.

  2. From the Applications list, select AWS-Aurora DB.

  3. Click Add to add a resource for scanning.

    1. Type the text of the resource and it will display the list of resources that matches the text.

    2. Select the scan type.

    3. Click Save.

  4. Click the Status toggle to globally enable scanning.

    • For real-time scan, resources will be automatically scanned when they are added to the Included Resources list.

    • For offline scan, click Scan Resource button to initiate a scan.

  5. Repeat these steps as needed for other data resources or applications you intend to enable for scanning.

    • The names of displayed fields will be different depending on the type of resource or application you are configuring (for example, Include Resource or Include Database or Table).

    • Resources in the landing zones are automatically scanned by Privacera. For more information on Data Zones see Data Zones.

Google Cloud Storageand Google BigQuery

Using a single Google Cloud Storage or Google BigQuery data source, you can scan resources from multiple projects. You can search for projects to be added, and select resources from the project to be included for scanning. To retrieve the list of projects in Google Cloud Storage or Google BigQuery, configure the Google Cloud Manager API.

Note

Data Explorer does not support showing resources from multiple projects. It only shows resources for the project with which the data source is configured.

Prerequisites

To allow Privacera search for projects on your Google account, you need to enable the API services in the GCP project you registered as a data source. Refer the Google documentation to enable API services.

Add resources to Google Cloud Storage or Google BigQuery data sources

Before you can add resources to a data source, your data source must be registered and the prerequisite requirements must be met in order to continue. For more information on registering a data source, see data source registration.

To add resources to Google Cloud Storage or Google BigQuery data sources, do the following:

  1. From the navigation menu, select Discovery > Data Source.

  2. From the Applications section, select a Google Cloud Storage or Google BigQuery data source.

  3. Click Add.

  4. In the Add Resource dialog, enter the following:

    1. Enter the Project ID of the resource you want to scan. You can enter an asterisk (*) to get a list of projects.

      • For Google BigQuery, the Project ID will be appended to the dataset or table name.

      • For Google Cloud Storage, the Project ID will not append to the bucket name as they are unique across a project.

    2. Enter the Resource you are including in the project.

      Note

      Resources can be added from multiple projects. Existing resources will be updated with a project ID. If you have resources in a specific directory, you can add this location path so that all of the databases/tables in that location are scanned.

      • For Google Cloud Storage, add the bucket resources.

      • For Google BigQuery, add the datasets or tables.

    3. Select a scan type:

      • Scan: Select this option if you want to perform real-time/offline scan.

      • Incremental: Select this option if you want to scan the resource once. During a re-scan, the resource gets added in the Excluded Resources list.

    4. Multi-input: Turn on this button if you want to switch to a multiple input view and add multiple resources, one per line.

    5. Click Save.

  5. To enable the real-time/offline scan for the Google Cloud Storage or Google BigQuery data source, click the Status toggle.

Start or cancel a scan

There are several ways to start scans in Privacera Discovery:

  • From the Data Source page, which is described here.

  • For offline (re-scan) or realtime (continuous) scans, as detailed in Start Types of Scans.

  • If you have set up datazones, starting a scan, called reevaluation, is discussed in Data Zones.

Start a scan from the Data Source page

To start a scan from the Data Source page, follow these steps:

  1. From the Applications section, select the application that contains the resource you want to scan.

  2. In the Scanning Details section, locate the resource you want to scan.

  3. Click SCAN RESOURCE.

A message appears indicating that a scan has been initiated.

Cancel a scan

To cancel a scan, follow these steps:

  1. Go the Scan Status page.

  2. Locate the scan that you want to cancel.

  3. Click Cancel.

Tags

Tags are an important part of Privacera Discovery and access control. In addition to security policies for resources and roles, you can create policies based on tags. Using tag-based policies, you can manage access to sensitive data regardless of where the data is stored.

Privacera Discovery scans data sources and tags all sensitive information across the enterprise. Example tags include PERSON_NAME, PII, ADDR, or EMAIL_ADDR. A dataset attribute, such as a column, table, or file, can be tagged with metadata information that can be used to classify the data asset. For example, a column titled "Email" or "Phone_Number" can be tagged as PII.

Tags enrich existing information about your data. Data administrators can create access control policies based on the tags created by Privacera Discovery. You can view your tags from the Tags Information page.

If you have defined rules, the generation of tags depends on the order of the rules. For more information, see Processing order of scan techniques and Reorder structured rules.

Add Tags

You can add tags in Privacera Discovery from the Tags Information page.

To add a tag, folow these steps:

  1. From the Privacera home page, expand the Discovery menu and select Tags Information.

  2. Click the + icon.

    The Add Tag dialog is displayed.

  3. In the Tag Name field, enter a name for the tag.

  4. In the Description field, enter a description of the tag (optional).

  5. Click Save.

    The tag is added.

Edit Tag Descriptions

You can edit the descriptions of tags in Privacera Discovery from the Tags Information page.

Note

You cannot change a tag name after the tag is created.

To edit the description of a tag, follow these steps:

  1. In the Tags Information page, select the tag you want to edit from the Tags list and click Edit.

    The Edit Tag dialog is displayed.

  2. Update the Description field.

  3. Click Save.

    The tag is updated.

Delete Tags

You can delete tags in Privacera Discovery from the Tags Information page.

To delete a tag, follow these steps:

  1. In the Tags Information page, select the tag you want to edit from the Tags list and click Delete.

    The following message is displayed: “Are you sure you want to delete this tag?”

  2. Click Yes to delete the tag or No to return to the Tags Information page.

Search for Tags

You can search for tags in Privacera Discovery from the Tags Information page.

To search for a tag, enter the name of the tag into the Search Tag field.

Add, Edit, or Delete Tag Attributes

The Attributes section displays a list of attributes associated with a tag. You can search the list of attributes using the search box. The Attributes section also displays the total number of records with this tag.

To add an attribute for a specific tag, follow these steps:

  1. In the Tags Information page, select the tag from the Tags list.

  2. Click Add Attribute

    The Add Attribute dialog displays.

  3. In the Name field, enter the name of the attribute.

  4. In the Value field, enter the value of the attribute.

  5. Click Save.

    The attribute is added to the selected tag.

Note

You can delete or edit the attribute from the Actions column.

Export Tags

To export the tag file in JSON format, follow these steps:

  1. Click Export.

  2. Check the checkbox of the required tag and click the Export. You can select multiple tags.

    The tag file is exported.

Import Tags

To import a tag file in JSON format, follow these steps:

  1. Click the Import icon.

    The Import dialog displays.

  2. Select the JSON file you want to export.

  3. Click Save.

    The tag file is imported.

Fetch AWS S3 Tags

Privacera Discovery allows you to fetch AWS S3 tags. There are two types of tags that can be fetched:

  • Object Tags: Tags associated with the AWS S3 object or files in buckets.

  • Bucket Tags: Tags associated with the S3 bucket.

To fetch AWS S3 tags, follow these steps:

  1. Navigate to Discovery > Tags Information and create a tag named AWS_S3_TAG.

  2. Navigate to Settings > Data Source Registration and add or update the application properties as below:

    1. Set "Fetch S3 Object Tags": true

    2. Set "Fetch S3 Bucket Tags": true

      Note

      By default these properties are disabled and set to false.

  3. Go to Data Inventory > Classifications and click AWS_S3_TAG under the Tag column, then click on View attributes link.

  4. Click View attributes .

    AWS S3 tags will be displayed in the Data Info grid.

Note

  • If the AWS_S3_TAG tag is not created, then AWS S3 tags will not be fetched and the tag will not be displayed in Classification page.

  • If both the Object and Bucket tags are enabled and have a common tag, then the Object tag will override the Bucket tag. For example: If the Bucket tag is owner=user1 and the Object tag is owner=user2, then the AWS_S3_TAG tag will have owner=user2 as its attribute.

  • Tags fetched from AWS S3 will be added as attributes of the AWS_S3_TAG. This tag with attributes will be synced to Apache Ranger. Verify using the following URL: https://<EC2_Instance_IP>:6182/service/tags/tags.

Dictionaries

Dictionaries are lists of values used to identify data elements. Privacera Discovery matches dictionaries against your resources and data and can be applied to either content or metanames.

Example dictionaries include:

  • A dictionary of US person names used to identify names in a database.

  • A dictionary of common column name patterns used to identify a column of account IDs.

Dictionaries support multiple include/exclude patterns. This helps enable a longer transition from conventional patterns for pattern matching. For example, the 'email' conventional pattern and its associated structured and unstructured rules can be disabled and the same pattern value can be added as part of a new dictionary lookup. The resulting rules can then be configured just as conventional patterns.

Types of dictionaries

There are three types of dictionaries in Privacera Discovery:

  • Exact match: the value of the data must exactly match the value in the dictionary.

  • Fuzzy match: the matching is based on fuzzy logic instead of exact match.

  • Pattern match: the values in the dictionary are regular expressions.

Dictionary Keys

The key is used by Discovery rules to associate a tag with a resource element. Because a dictionary can be applied to either content or metaname, a naming convention is used for the key:

  • Content dictionary: LOOKUP suffix.

  • Metaname dictionary: KEYWORD suffix.

Manage dictionaries

Privacera Discovery comes pre-loaded with a set of useful dictionaries. You can also create your own custom dictionaries and configure rules to use them.

The values in a dictionary can come from a text file that can be uploaded through the portal or directly copied into your installation. For smaller dictionaries, you can add values using the Privacera portal either one by one or with the bulk input interface. For dictionaries that are file-based, you can add additional values or exclude existing values using the Privacera portal.

When a dictionary is created or modified, the updated dictionary becomes available for use within a few minutes.

Add a dictionary

To add a dictionary, follow these steps:

  1. On the Dictionaries page, click the + sign.

    The Add Dictionary dialog is displayed

  2. Enter the following details:

    • The Name of the dictionary (required)

    • The Description of the dictionary.

    • The Key field is not editable because it is populated by the system. You have the option to add IPv4 and IPv6 address regexes as an option under Key description for regexes and used to lookup dictionary content.

    • The required File name.

  3. Select the required Type: Exact, Pattern, or Fuzzy match.

    Note

    For pattern dictionaries, see Pattern Validation.

  4. Select Apply For. The choices are content or metaname. If you select metaname, for pattern type dictionaries, you have the choice to apply the input tags directly to the resource. See Add Meta Tags Directly to Dictionary.

  5. Select the Status (enabled by default).

  6. Click Save.

    The dictionary is added.

Add meta tags directly to a dictionary

When you create a new dictionary of type pattern, you can apply meta tags directly to a data source. The option appears after you select the combination of pattern and metaname.

Upload a dictionary

To upload a dictionary, follow these steps:

  1. In the Dictionaries page, click Upload Dictionary.

    The Upload Dictionary dialog is displayed.

  2. Select the .txt file of the dictionary you want to upload.

  3. Click Save.

    The dictionary file is uploaded.

Edit a dictionary

To edit a dictionary, follow these steps:

  1. In the Dictionaries page, select a dictionary from the dictionary list and click Edit.

    The Edit Dictionary Info dialog is displayed.

  2. Update the required fields.

  3. Click Save.

    The dictionary is updated.

Copy a dictionary

To make a copy of a dictionary, follow these steps:

  1. On the Dictionary page, select a dictionary from the dictionary list and click Create Copy.

  2. The Copy Dictionary Info dialog is displayed with selected Type and Apply For** values.

  3. Enter the following details:

    • Enter the Name dictionary (required).

    • Enter the Description of dictionary.

    • Enter the File name (required).

    • Select the Type (required).

    • Select the Apply For (required).

    • Select the Status (enabled by default)

  4. Click Save.

    A copy of the dictionary is created.

Enable or disable a dictionary

To enable or disable a dictionary, follow these steps:

  1. On the Dictionaries page, select a dictionary from the Dictionary list

  2. Click the Status toggle to enable or disable the dictionary.

Search for a dictionary

To search for a dictionary, navigate to the Dictionaries page and enter the dictionary name into the search bar.

Dictionary tour

To see an explanation of the different components of a dictionary, click Tour on the Dictionaries page.

Include a Dictionary

You can filter the list of included dictionaries using the search included dictionary option. This tab also displays the current count of records relying on the dictionary.

The Include Dictionary tab displays the following:

  • Name: Name of the dictionary.

  • Description: The lookup/keyword description.

  • Actions: Edit or delete dictionaries.

  • Bulk Edit/Delete: Select this to edit or delete the dictionary values in bulk. After selecting, click x to delete the values.

Add keywords to an included dictionary

To add a keyword or lookup under Include Dictionary, follow these steps:

  1. On the Dictionaries page, select a dictionary from the dictionary list.

  2. In the Include Dictionary tab, click ADD.

    The Add Dictionary dialog is displayed.

  3. Enter the name of the keyword or lookup, one name per line.

  4. Add a Description for the dictionary name.

  5. Click Save.

    The keyword or lookup is added to the selected dictionary in the Include Dictionary tab.

Exclude a dictionary

You can filter the list of excluded dictionaries using the search excluded dictionary option. This tab also displays the total record count.

The Exclude Dictionary tab displays the following information:

  • Name: Indicates name of the dictionary.

  • Actions: Allows you to edit and delete the dictionary.

To add a lookup in the Exclude Dictionary tab, follow these steps:

  1. On the Dictionaries page, select a dictionary from the Dictionary list.

  2. Select the Exclude Dictionary tab and click +Add.

    The Add Dictionary dialog displays.

  3. In the Name field, enter the names of the dictionaries, one name per line.

  4. In the Description field, enter a description for the dictionary.

  5. Click Save.

    The lookup is added to the selected dictionary.

Import a dictionary

To import a dictionary in JSON format, follow these steps:

  1. On the Discovery page, click Import.

    The Import dialog is displayed.

  2. Select the JSON file of the dictionary you want to import and click Save.

    The dictionary configuration file is imported.

Export a dictionary

To export a dictionary in JSON format, follow these steps:

  1. On the Dictionaries page, click Export .

  2. Check the checkbox of the required dictionary and click Export.

    Note

    You can select multiple dictionaries.

    The dictionary file is exported.

Test dictionaries
Pattern validation

If the dictionary is of type pattern, you can validate its regexes.

To validate a pattern, follow these steps:

  1. In the Dictionaries page, add a new dictionary of type 'Patterns'.

    The Add Dictionary field for the pattern type is displayed.

  2. Enter a complex Expression (regex).

  3. Enter the Description for the expression.

  4. Enter the Input Test Data.

  5. Click Test Expression.

The message "Passed" or "Failed" appears in the Test Output field.

Test against a data source

To test changes to a dictionary, follow these steps:

  1. Perform an offline scan of the data source that has sensitive fields you want to test.

  2. Check the Scan Status.

  3. After the scan is completed, open the resource to verify if the scan classified the tags correctly.

The tags are classified under Data Inventory > Classification.

List of Privacera-supplied dictionaries

The following is a list of the Privacera-supplied dictionaries. The name of a dictionary in general describes the purpose of the dictionary. For precise details, look at the dictionary itself in the Platform UI.

  • AU_BSB_LOOKUP

  • BINARY_MIME_KEYWORD

  • CC_KEYWORD

  • CC_PROTECTED_KEYWORDDisabled

  • CITY_KEYWORD

  • COUNTY_KEYWORD

  • CRIMINAL_RECORD_LOOKUP

  • DISALLOW_DOB_KEYWORDDisabled

  • DISALLOW_NAME_KEYWORDDisabled

  • DISALLOW_ZIP_KEYWORDDisabled

  • DOB_KEYWORD

  • ETHNICITY_LOOKUP

  • EXEC_MIME_KEYWORD

  • GEO_KEYWORD

  • GPS_KEYWORD

  • IMAGE_MIME_KEYWORD

  • ISO3166_CC_LOOKUP

  • MEDICAL_RECORD_LOOKUP

  • ORG_LOOKUP

  • PASSPORT_KEYWORD

  • PASSWORD_KEYWORD

  • PERSON_NAME_KEYWORD

  • PERSON_NAME_LOOKUP

  • PII_ID_KEYWORD

  • SSN_KEYWORD

  • STATE_KEYWORD

  • SWIFT_BIC_KEYWORDDisabled

  • SWIFT_BIC_LOOKUPDisabled

  • TAX_ID_KEYWORD

  • UK_ELECTORAL_ROLL_KEYWORDDisabled

  • UK_NHS_KEYWORDDisabled

  • UK_NINO_KEYWORDDisabled

  • UK_POSTAL_TOWN_LOOKUPDisabled

  • US_ABA_NUMBER_KEYWORDDisabled

  • US_ADDRESS_KEYWORD

  • US_CITY_KEYWORD

  • US_CITY_LOOKUP

  • US_COUNTY_KEYWORDDisabled

  • US_COUNTY_LOOKUPDisabled

  • US_DLICENSE_KEYWORD

  • US_DLICENSE_LOOKUP

  • US_STATE_KEYWORD

  • US_STATE_LOOKUP

  • US_ZIP_KEYWORD

  • US_ZIP_LOOKUP

Patterns

Patterns are deprecated

Patterns are deprecated, embed patterns in Dictionaries instead.

Note

In a future release Discovery patterns will be removed from the left nav, because they are not used frequently. Instead, customers should now embed patterns in dictionaries. If you have any patterns in use, you should move them to dictionaries now.

Patterns are regular expressions (regexes) that match specific data elements in your data resources.

Privacera-supplied regular expressions can match common patterns like email addresses and URLs.

You can also define your own regexes to isolate patterns in your data to augment Privacera's patterns.

Add patterns

To add a pattern, do the following:

  1. From the navigation menu, select Discovery > Patterns.

  2. Click Add Pattern.

    The Add Pattern dialog is displayed.

  3. In the Pattern Name field, enter a pattern name.

  4. From the Applied On dropdown menu, select one of the following options:

    • All: Pattern matching is applied at the file level (default).

    • File Content: Pattern matching is applied to the content of the file.

    • File Name: Pattern matching is applied based on file name.

    • Table/Column Name: Pattern matching is applied based on table or column name.

  5. Using the Regex Status toggle, enable (default) or disable regexes.

  6. In the Expression field, enter an expression. For example, an expression for a bank account number might be b(d{9}\|d{12})b.

  7. In the Description field, enter a description.

  8. In the Input Test Data field, enter your test data.

  9. Click Test Expression to verify the expression you entered into the Input Test Data field.

    The test result is displayed in Test Output field.

  10. Click Save.

The pattern is added.

Edit patterns

To edit a pattern, do the following:

  1. On the Patterns page, locate the pattern you want to edit and click the Edit icon in the Actions column.

  2. Update the required fields.

  3. Click Save.

The pattern is updated.

Delete patterns

To delete a pattern, do the following:

  1. On the Patterns page, locate the pattern you want to delete and click the Delete icon in the Actions column.

    You are prompted with a message to confirm the deletion.

  2. Click Yes to delete the pattern.

The pattern is deleted.

Search for patterns

To search for a pattern, enter the pattern name in the search bar on the Patterns page and click Enter.

The search results are displayed.

Export JSON pattern files

To export patterns to a file in JSON format, do the following:

  1. On the Patterns page, click Export.

  2. Select the patterns you want to export and click Export.

The pattern file is exported.

Import JSON pattern files

To import a pattern file in JSON format, do the following:

  1. On the Patterns page, click Import.

  2. Select the JSON file you want to import and click Save.

The pattern file is imported.

List of Privacera-supplied patterns

The following is a list of the Privacera-supplied patterns. You can view details about each of the patterns from the Patterns page.

  • ACCOUNT

  • CREDIT_CARD

  • EMAIL

  • FINANCIAL

  • IPV4

  • IPV6

  • MAC_ADDRESS

  • STREET_ADDRESS

  • UK_DRIVER_LICENSE

  • UK_ELECTORAL

  • UK_NINO

  • UK_POSTAL_CODE

  • UK_US_PASSPORT

  • URL

  • ZIPCODE

Scan status

After you trigger a manual scan, you can check the progress of the scan from the Scan Status page.

During manual or offline rescan, if a file under a specified directory does not exist, the scan marks that the data was deleted in Classification. This is applicable only when realtime scan is disabled. The deleted resources are stale and can be viewed under Stale Resources.

Scan IDs that have not resulted in any tag classification are periodically removed from the status page.

Status summaries

To check the status of your scans, select Discovery >Scan Status from the navigation menu.

Scans can have the following statuses:

  • Pending: Number of scan requests in pending state.

  • Listing: Number of scan requests in listing state.

  • Running: Number of scan requests in running state.

  • Success Number of successfully completed scan requests.

  • Failed: Number of failed scan requests.

  • Killed: Number of killed scan requests.

  • Cancelled: Number of cancelled scan requests.

  • Retry: Number of scan requests moved into retry state.

Note

Scanning durations are shown for data in different stages. For example, Listing shows the time taken to scan the existing data.

List of individual scans

The Scan Status page displays a table of individual scans that includes the following information:

  • Scan Id: The scan ID with a clickable link to view a summary of the scan.

    The Scan Type is shown as Scan (which is a full scan) or Incremental.

  • Status: The status of the scan request.

  • Scan/Total Resource: The number of files or tables scanned out of the total number present in the scan request.

  • Application: The name of the application, such as Hadoop-Hive or Azure-ADLS.

  • Resource: The name of resource. Click the resource to view the classification page for that resource.

  • Create Time: The date and time that the scan was triggered.

  • Start Time: The start time of the scan.

  • End Time: The end time of scan.

  • Duration: The scan duration.

  • Request User: The name of the user who triggered the scan.

  • Type: The type of scan, such as offline or realtime scan.

  • Policy: The name of the policy.

View Individual Summary Reports

Click View Summary Report in the Scan Id column to view Summary Info details for the selected scan ID such as Tagged Resources, UnTagged Resources, Excluded, Failed Resources, Properties, Diagnostic Info, Logs, Scan Cleanup, and Stale Resources.

Export Scan Summary

To download the scan summary, click Export and follow the leading prompts.

Data zone movement

To view a summary of data zone movement, select Compliance Workflow > Data Zone Movement from the navigation menu.

View undefined data zone movements

On the Data Zone Movement page, click Show Undefined Zone Movements to view undefined zone movements.

Filter data zone movements

You can filter the data zone list using the Filter Data Zone option. You can also filter data zone movements by date range, including:

  • Today

  • Yesterday

  • Last 30 Days

  • This Month

  • Last Month

  • Custom Range

Note

By default, the date range is set to Last 7 Days.

Click Refresh to refresh the list of data zones.

Models

Models detect specific data elements in your data resources. The detection is done with various algorithms and heuristics.

Types of models

Privacera supports different types of models. You can filter the list of models using the search model option. This tab also displays the present number of record count.

Generic models

These are various general model parameters you can use to tailor matching of data.

Parameter

Data Type

Default

Description

INCLUDE_PATTERN_<#>

String

None

Patterns to be matched.

Can contain more than one pattern by changing the value of the <#> variable. For example: INCLUDE_PATTERN_1, INCLUDE_PATTERN_2, INCLUDE_PATTERN_3.

EXCLUDE_PATTERN_<#>

String

None

Patterns to be excluded from matching.

Can contain more than one pattern by changing the value of the <#> variable. For example, EXCLUDE_PATTERN_1, EXCLUDE_PATTERN_2, EXCLUDE_PATTERN_3.

ONLY_DIGITS

Boolean

FALSE

Indicates whether matching should use only the digits. Setting this parameter TRUE removes all non-numeric characters in the string before matching. For example, 1234-5 is treated as 12345.

CHECK_DIGIT_CODE_VALIDATE

String

None

Indicates whether to evaluate a checksum digit based on the last digit. Valid values:

  • LUHN

  • ABA

  • CUSIP

  • DIHEDRAL

  • IBAN

  • UK_NHS

  • MOD11

  • ISBN10

DO_LOOKUP

Boolean

FALSE

Indicates whether to use patterns specified by the LOOKUP_PATTERN parameter. If this parameter is set to TRUE, the patterns specified in LOOKUP_PATTERN are used.

LOOKUP_DICT

String

None

A dictionary name or key. See Dictionaries.

LOOKUP_PATTERN

String

None

Pattern for matching. See Patterns.

Note

See Embed Patterns in Dictionaries.

ISO3166_CC_VALIDATE_FLAG

Boolean

FALSE

Indicates whether to use Privacera-defined matching to validate an ISO two-character country code. If this parameter is set to TRUE, ISO3166_CC_PATTERN is used.

ISO3166_CC_PATTERN

None

A valid pattern for matching country codes. See Patterns.

Note

See Embed Patterns in Dictionaries.

ISO3166_CC_LOOKUP_KEY

None

Name of a defined dictionary. See Dictionaries.

Credit card model

The credit card model detects credit card numbers. It validates numbers based on the issuing network, length, and Luhn checksum.

Parameter

Type

Default

Meaning

CC_PATTERN

String

Privacera-supplied pattern for credit card numbers with range of digits, space or hyphen separated.

Credit card pattern, if you want to override the supplied pattern.

DEFAULT_TYPES

Boolean

True

Validate against known issuing network prefixes.

LUHN_CHECK

Boolean

True

Validate the Luhn checksum on the credit card number.

Supported credit card types

Credit Card Type

Conditions

Examples

American Express (AMEX) Card

Credit card starting with 34 or 37 and having 15 digits.

34xxxxxxxxxxxxx

37xxxxxxxxxxxxx

Master Card

  • Credit card starting with 51 to 55 and having 14 digits

  • Credit card starting with 2221 and having 12 digits

  • Credit card starting with 27 and having 13 digits.

51xxxxxxxxxxxx

2221xxxxxxxx

27xxxxxxxxxxx

Visa Card

Credit card starting with 4 and having 13 Or 16 digits.

4xxxxxxxxxxxx

4xxxxxxxxxxxxxxx

Diners Club Card

Credit card starting with 300 to 305 or 3095 or 36 or 38 or 39 and having 14 digits.

300xxxxxxxxxxx

3095xxxxxxxxxx

VPay (Visa) Card

Credit card starting with 4 and having 13 or 19 digits.

4xxxxxxxxxxxx

4xxxxxxxxxxxxxxxxxx

Date of birth model

The Date of Birth model detects various date formats.

Parameter

Type

Default

Meaning

MIN_AGE_YEARS

Integer

5

Age lower threshold.

MAX_AGE_YEARS

Integer

100

Age upper threshold.

USE_ALGO

Boolean

True

Tagging is done based on an algorithm to detect random distribution.

DATE_REGEX_var1

String

Pattern that matches a custom date format var1.

DATE_FORMAT_var1

String

Date Format that matches the pattern for var1.

Pre-configured date formats are:

  • International YYYYMD format with 4 digit year

  • US MDY with 4 digit or 2 digit year

  • Month abbreviated MDY

Additional formats can be configured. For example, configure a regex and a Java date format:

Parameter

Type

DATE_REGEX_1

\d{4} \d{2} \d{2}

DATE_FORMAT_1

yyyy MM dd

EIN model

The EIN model detects Employer Identification Number using patterns and digit validation.

Parameter

Type

Default

Meaning

EIN_PATTERN

String

Default

EIN digit pattern if you want to override the default pattern.

VALIDATIONS

Boolean

True

Age upper threshold.

STRICT_PATTERN

Boolean

True

Allow match only if EIN has exact format.

Geo latitude and longitude model

The geo model detects latitude and longitude coordinates. It can validate these values based on a geographical area.

Parameter

Type

Default

Meaning

MIN_LAT

Double

US min latitude

Lower limit (southern) on latitude.

MAX_LAT

Double

US max latitude

Upper limit (northern) on latitude.

MIN_LONG

Double

US min longitude

Lower limit (west) on longitude.

MAX_LONG

Double

US max longitude

Upper limit (east) on longitude.

MIN_FRACTIONAL_DIGITS

Integer

3

Minimum number of digits after the decimal point.

IMEI model

The IMEI model detects International Mobile Equipment Identity numbers that are used to identify mobile phones. It validates the Luhn checksum and the length of the IMEI.

ITIN model

The ITIN model detects Individual Tax Identifier Numbers (identifiers of individual taxpayers). It validates the format and digits of the ITIN.

Parameter

Type

Default

Meaning

ITIN_PATTERN

String

Default

ITIN digit pattern if you want to override the default pattern.

STRICT_PATTERN

Boolean

True

Allow match only if ITIN has exact format.

MIME model

The MIME model detects a file based on its Multipurpose Internet Mail Extensions type. The MIME type is detected using a combination of file extension and magic bytes in the header of the file. The detected MIME type is then looked up in a dictionary of MIME types.

Parameter

Type

Default

Meaning

LOOKUP_DICT

String

Identifier of dictionary of MIME types.

There are two pre-configured MIME model instances.

  • For detecting executable files: LOOKUP_DICT=EXEC_MIME_KEYWORD.

  • For detecting image files: LOOKUP_DICT=IMAGE_MIME_KEYWORD.

Phone number model

The Phone Number model detects phone numbers. It validates the format of the phone numbers based on the country for which it is configured.

Parameter

Type

Default

Meaning

COUNTRY_CODE

String

US

Two-character country code.

SSN model

The SSN model detects US Social Security Numbers. It validates the format and checks against a blacklist of SSN numbers.

Parameter

Type

Default

Meaning

SSN_PATTERN

String

Default

Override the default SSN pattern.

VALIDATIONS

Boolean

True

Validate against known blacklist of SSNs.

STRICT_PATTERN

Boolean

False

Allow match only if SSN has exact format.

USE_9_DIGIT_PATTERN

Boolean

False

Match against any nine digit number without format.

USE_4_DIGIT_PATTERN

Boolean

False

Match against any four digit number without format. Disables validation with blacklist of SSN.

STRICT_EXT_PATTERN

Boolean

True

Allow match only if SSN has exact format that is hyphen-, dot-, or space-separated.

Examples of Invalid SSNs

The SSN model would determine that the following SSNs are invalid.

  • SSN starting with 9 or 666 or 000 or 98765432.

  • SSN with 00 as the 4th and 5th digits.

  • SSN with 0000 as the sixth through ninth digits.

  • Any SSN like these:

    • 123456789

    • 111111111

    • 222222222

    • 333333333

    • 444444444

    • 555555555

    • 666666666

    • 777777777

    • 888888888

    • 999999999

VIN model

The VIN model detects Vehicle Identification Numbers. It validates the length and the VIN checksum.

Zip model

The Zip model detects US Zip codes. It detects both 5 digit and 5+4 digit variations and validates against a dictionary of US Zip codes.

Parameter

Type

Default

Meaning

ZIP_DICT_KEY

String

US_ZIP_LOOKUP

Key of the US Zip dictionary.

ZIP_PATTERN

String

Default

Validates content regular expression for list of ZIP codes.

STRICT_PATTERN

Boolean

False

Allow match only if Zip code has exact format. If set to true then only nine digits containing '-' and starting with five digits are considered a Zip code.

Create a model

To create a model, follow these steps:

  1. From the navigation menu, select Discovery > Models.

  2. Click Add Model.

    The Add Model dialog is displayed.

  3. In the Name field, enter a name for the model.

  4. In the Description field, enter a description of the model.

  5. In the Key field, enter a model key.

  6. From the Type dropdown menu, select a model type.

    Note

    See Types of Models for more information.

  7. From the Apply For dropdown menu, select File content.

    Note

    File content is resource content.

  8. Enable or disable the model using the Model Status toggle.

  9. Add model properties by clicking +.

  10. Enter a key and value into the Key and Value field. For example: Key: MIN_FRACTIONAL_DIGITS, Value: 2. You can add multiple model properties.

    Note

    For example: Key: MIN_FRACTIONAL_DIGITS, Value: 2. You can add multiple model properties.

  11. Click Save.

    The model is created.

Edit a model

You can edit a model by clicking the Edit icon in the Actions column.

To edit a model, follow these steps:

  1. Click the Edit icon in the Actions column.

    The Edit Model dialog displays.

  2. Make your desired changes.

  3. Click Save.

    The model is updated.

Delete a model

You can edit a model by clicking the Delete icon in the Actions column.

To delete a model, follow these steps:

  1. Click the Delete icon in the Actions column.

    The Confirm Delete dialog displays.

  2. Select Delete to confirm the deletion.

    The model is deleted.

Import a model

To import a model file in JSON format, follow these steps:

  1. In the Models home page, click the Import option.

    The Import dialog is displayed.

  2. Browse and select the JSON file and click Import.

The model file is imported.

Export a model

To export a model file in JSON format, follow these steps:

  1. In the Models page, click Export.

  2. From the drop-down menu, select one of the following options:

    • All Records: Export the entire set of models.

    • Select Records: Select the specific model to export. You can select multiple models.

  3. Click Export.

    The JSON file is exported.

List of Privacera-supplied models

The following is a list of the Privacera-supplied models. For precise details, look at the model itself in the Platform UI.

  • DOB_ML_MODEL

  • CC_ML_MODEL

  • ZIP_ML_MODEL

  • IMEI_ML_MODEL

  • SSN_ML_MODEL

  • EXEC_ML_MODEL

  • MIME_ML_MODEL

  • PHONE_NUMBER_ML_MODEL

  • GEO_LAT_LONG_ML_MODEL

  • CC_ML_MODEL_PROTECTED

  • EIN_ML_MODEL

  • ITIN_ML_MODEL

  • VIN_ML_MODEL

  • SSN_9_DIGIT_ML_MODEL

  • SSN_4_DIGIT_ML_MODEL

  • IMAGE_FILE_ML_MODEL

  • IMAGE_ML_MODEL

Disallowed Tags policy

This policy helps to monitor and raises an alert if any PII tags are identified. You can add multiple tags by clicking enter after each value.

The Disallowed Tags policy has the following fields:

  • Name: The name of the Disallowed Movement policy.

  • Type: The type of policy.

  • Alert Level : The alert level: high, medium, or low.

  • Description: The description of the Disallowed Movement policy.

  • Disallowed Tags: Allows you to add multiple tags to be disallowed.

Add Disallowed Tags policy

If you are creating Disallowed Movement and Disallowed Tags policies, then you can capture data zone movement using Spark. Data Zone movement can be captured in HDFS to S3.

To capture Data Zone movement using Spark, follow these steps:

Note

These data zones are examples. You should create your own.

  1. Create directories in HDFS and add the file in one of the HDFS locations:

    hdfs dfs -mkdir /colour/purple 
    hdfs dfs -mkdir /colour/pink
    hdfs dfs -put /finance_us.csv /colour/purple/
    
  2. Add both the created directories in Include resource of HDFS.

  3. Create two Data Zones and add the two folders in those two Data Zones' Resources.

    • SourceDz: It should have resource e.g. /colour/purple/ and also the Data Zone tag.

    • DestinationDz: It should have resource e.g. /colour/pink/ and also the policies configured for disallowed movement and disallowed tags.

  4. Set the Application property as follows:

    Generate Alert All Part Files = false

    Note

    If you set Generate Alert All Part Files to false, the system generates an alert for the first two part files. If you set this property to true, the system generates an alert for all part files.

  5. Go to the terminal and log into Spark shell as follows:

    spark-shell --packages com.databricks:spark-csv_2.10:1.5.0  scala> val df = sqlContext.read.format("com.databricks.spark.csv").option("header", "true").load("/colour/purple/finance_us.csv")  scala> df.coalesce(1).write.mode ("overwrite").format("com.databricks.spark.csv").option("header", "true").save("/colour/pink/finance_us_11")  scala> df.repartition(4).write.mode ("overwrite").format("com.databricks.spark.csv").option("header", "true").save("/colour/pink/finance_us_100") 
    

    The following output is displayed:

    • Kafka Topics: Check the Kafka topics audit consumption for Alerts and Lineage.

    • Alerts Details: Check the Alerts Details tab on the resource details for this resource.

    • Lineage: Check the Lineage for this resource.

    • Alerts Generated for part file : Check the Data Zone Graph for the alerts generation for the part files in DestinationDz.

Rules

You can create and manage custom and system-provided rules in Privacera Discovery. By executing the conditions in each rule, Discovery applies classifications to your data. The output tag associated with the processed rule is applied to the resource as the final tag.

The generation of tags depends on the order of the rules. See Processing Order of Scan Techniques and Reorder Structured Rules.

You can also create rule mappings.

Types of rules

There are three types of rules in Privacera Discovery:

  • Structured

  • Unstructured

  • Post-processing

Example rules and classifications

Based on the tags found in a structured or unstructured rule or a table in various columns, we can assign a tag to the file or the table. This is an AND conditions of output tags. For example, you can set multiple rules as follows:

  1. If a file has PERSON_NAME AND EMAIL AND SSN , tag as PII.

  2. If a file has USER_ID AND GEO, tag as SENSITIVE .

  3. If a file has USER_ID AND IP , tag as SENSITIVE .

Create a structured rule

To create a structured rule, follow these steps:

  1. From the navigation menu, select Discovery > Rules.

  2. On the Rules page, click Structured > Create Rule.

    The Create Rule dialog is displayed.

  3. In the Create Rule dialog, enter the following details:

    • Name: The name of the rule.

    • Description: A description of the rule (optional).

    • Must Have: From the dropdown menu, select dictionaries, patterns, or models to be included in the rule.

    • Must Not Have: From the dropdown menu, select dictionaries, patterns, or models to be included in the rule.

    • Score Type: From the dropdown menu, select one of the following options:

      • Auto: If the rule is applied, the resource is classified as System.

      • Review: If the rule is applied, the resource is classified as Pending Review.

    • Output Tags: The tags associated with the rule.

    • Key For Samples: The keys from the objects in the Must Have dropdown menu.

    • Enable rule: The rule is enabled or disabled.

  4. Review the information in Rule preview section.

  5. Click Save.

    The structured rule is created.

Reorder structured rules

Rule order decides the priority of the rules applied during classification.

To reorder rules, follow these steps:

  1. On the Rules page, click Reorder.

  2. Drag the rules up or down to change the order.

  3. Click Save Order.

    The new order is saved.

Create an unstructured rule

To create an unstructured rule, follow these steps:

  1. From the navigation menu, select Discovery > Rules.

  2. On the Rules page, click Unstructured > Create Rule.

    The Create Rule dialog is displayed.

  3. Enter the following details:

    • Rule Name: Name of the rule.

    • Description: Description of the rule (optional).

    • Must Have: From the dropdown menu, select dictionaries, patterns, or models to be included in the rule.

    • Must Not Have: From the dropdown menu, select dictionaries, patterns, or models to be excluded from the rule (optional).

    • Word Proximity: Name of a pattern to identify sensitive information within the specified number of words.

    • Key order strict: Using the toggle, indicate whether key order is strictly followed.

    • Enable rule: Using the toggle, enable or disable the rule.

  4. Review the information in the Rule preview section.

  5. Click Save.

The unstructured rule is created.

Create a rule mapping

To create a rule mapping, follow these steps:

  1. From the navigation menu, select Discovery > Rules.

  2. On the Rules page, click Rule Mapping > Add Mapping.

    The Add Key Tag Mapping dialog is displayed.

  3. From the Key dropdown menu, select a dictionary, pattern, or model.

  4. From the Tag dropdown, select a tag.

    Note

    You can add multiple keys and tags by clicking +.

  5. Click Save

    The rule mapping is created.

Export rules and mappings

To export a rule file in JSON format for a structured rule, follow these steps:

  1. From the navigation menu, select Discovery > Rules.

  2. Click Export.

  3. Select the files you wish to export.

  4. Click Export

The rule file is exported.

Import rules and mappings

To import a JSON rule file for a structured rule, follow these steps:

  1. From the navigation menu, select Discovery > Rules.

  2. On the Rules page, click Import.

    The Import dialog is displayed.

  3. Click Choose File and select the JSON file.

    Note

    Selecting Clean Previous deletes all existing rules.

  4. Click Save.

The rule file is imported.

Post-processing in real-time and offline scans

With post-processing, the data is scanned and then the rules are applied on the tagged data in multiple passes. Post-processing can be used with both real-time and offline scans. Based on the output tags of the rules applied after the initial scan, with post-processing you can add additional tags on the parent or child data resources.

Post-processing rules should be applied after datazone and tag propagation is done.

For example, after the initial scan of a structured or unstructured file or columns within a table, Privacera Discovery will identify the data and classify them with tags based on the rules. After the initial scan has tagged various columns within a table or a file, you can use post-processing rules to assign additional tags to the file or the parent table.

Enable post-processing

To enable post-processing, follow these steps:

  1. Navigate to Setting > System Configuration.

  2. Search for the property privacera.portal.rules.post_process.enable=false.

    Note

    The default setting is false.

  3. Set the property to true.

Example of post-processing rules on tags
  1. From the navigation menu, select Discovery > Rules.

  2. On the Rules page, select Post-Processing.

  3. Create a new rule with the following condition: If PERSON_NAME and SSN are found, apply the SENSITIVE tag.

  4. Rescan the file to apply the post-processing rules.

    The fields are now classified as SENSITIVE and the tag is applied in the unformatted view.

List of structured rules

The following is a list of the Privacera-supplied structured rules. For more information about any, look at the pattern itself in the Platform UI.

  • Australia Bank Account Number

  • Australia Bank BSB code

  • Australia Driver License

  • IBAN Rule

  • rule_auto_1P

  • rule_auto_2P

  • rule_auto_3P

  • rule_auto_4P

  • rule_auto_5M

  • rule_auto_6M

  • rule_auto_7M

  • rule_auto_8M

  • rule_auto_9M

  • rule_biometric

  • rule_biometric_keyword

  • rule_cc

  • rule_city_name

  • rule_criminal_keyword

  • rule_dob

  • rule_email

  • rule_ethnicity_keyword

  • rule_gps

  • rule_gps_6_digit

  • rule_medical_keyword

  • rule_national_id

  • rule_password

  • rule_person_name

  • rule_phonenumber

  • rule_pii_id_keyword

  • rule_political_keyword

  • rule_religion_keyword

  • rule_sexual_orientation_keyword

  • rule_ssn_4_digit

  • rule_ssn_9_digit

  • rule_ssn_strict

  • rule_ssn_strict_fallback

  • rule_state_name

  • rule_street_address

  • rule_tax_id_9_digit

  • rule_tax_id_strict

  • rule_trade_union_keyword

  • Rule US ABA Routing Number

  • Rule US ABA Routing Number 2

  • rule_us_dlicense_keyword

  • rule_us_zip

  • rule_viewership_keyword

  • rule_web_keyword

  • SWIFT BIC Bank ID rule

  • SWIFT BIC Bank ID Rule 2

  • UK Driver License Rule

  • UK Electoral Roll number

  • UK NHS Rule

  • UK NHS Rule 2

  • UK NINO Rule

  • UK NINO RULE 2

  • UK Phone Number Rule

  • UK Postal Code

  • UK Postal Town

  • UK US Passport