Configuration to Use Metadata Dictionaries in Unstructured Content Scanning¶

Privacera Discovery provides the capability to apply metadata dictionaries to unstructured data for classification. By default, metadata dictionaries only scan file names and paths. This feature extends their capability to scan the actual content of unstructured data, including unstructured files (text files, documents, PDFs, etc.) and database columns containing unstructured content (text-heavy columns with more than 5 tokens).

For detailed information & overview, refer to the Using Metadata Dictionaries in Unstructured Content Scanning documentation.

Prerequisites¶

Discovery is installed and running. Refer Discovery installation steps

Setup¶

To enable metadata dictionary scanning in unstructured file content, follow these steps:

Step 1: Update Configuration Property¶

SSH into the instance where Privacera Manager is installed.
Navigate to the privacera-manager directory using the following command:
Bash
1
cd ~/privacera/privacera-manager

Add the following property to the file below:

Property Name	Default Value	Possible Values	Description
`DISCOVERY_APPLY_METANAME_DICT_TO_UNSTRUCT`	`false`	`true`, `false`	When enabled, makes metadata dictionaries available for scanning against unstructured content, including unstructured files and database columns with unstructured data (not just file names/paths). Requires creating rules with `c_` prefix.

AWSAzureGCP

Bash
1	`vi config/custom-vars/vars.discovery.aws.yml`

Bash
1	`vi config/custom-vars/vars.discovery.azure.yml`

Bash
1	`vi config/custom-vars/vars.discovery.gcp.yml`

Update the following variable:

YAML
# Apply Metadata Dictionaries to Unstructured Data
# This enables scanning of unstructured content (files and database columns with >5 tokens)
DISCOVERY_APPLY_METANAME_DICT_TO_UNSTRUCT: "true"

Save the file and exit the editor.

Step 2: Restart Privacera Services¶

Bash
cd ~/privacera/privacera-manager
./privacera-manager.sh setup
./pm_with_helm.sh upgrade 

After restarting, metadata dictionaries will automatically be available for scanning unstructured content. For information on creating classification rules, refer to the Classification Rules documentation.

For more details on how this feature works and best practices, refer to the Using Metadata Dictionaries in Unstructured Content Scanning documentation.