Skip to content

Configuration to Use Metadata Dictionaries in Unstructured Content Scanning

Privacera Discovery provides the capability to apply metadata dictionaries to unstructured data for classification. By default, metadata dictionaries only scan file names and paths. This feature extends their capability to scan the actual content of unstructured data, including unstructured files (text files, documents, PDFs, etc.) and database columns containing unstructured content (text-heavy columns with more than 5 tokens).

For detailed information & overview, refer to the Using Metadata Dictionaries in Unstructured Content Scanning documentation.

Prerequisites

  • Discovery is installed and running. Refer Discovery installation steps

Setup

To enable metadata dictionary scanning in unstructured file content, follow these steps:

Step 1: Update Configuration Property

  1. SSH into the instance where Privacera Manager is installed.
  2. Navigate to the privacera-manager directory using the following command:
    Bash
    cd ~/privacera/privacera-manager
    
  3. Add the following property to the file below:

    Property Name Default Value Possible Values Description
    DISCOVERY_APPLY_METANAME_DICT_TO_UNSTRUCT false true, false When enabled, makes metadata dictionaries available for scanning against unstructured content, including unstructured files and database columns with unstructured data (not just file names/paths). Requires creating rules with c_ prefix.
    Bash
    vi config/custom-vars/vars.discovery.aws.yml
    
    Bash
    vi config/custom-vars/vars.discovery.azure.yml
    
    Bash
    vi config/custom-vars/vars.discovery.gcp.yml
    

    Update the following variable:

    YAML
    1
    2
    3
    # Apply Metadata Dictionaries to Unstructured Data
    # This enables scanning of unstructured content (files and database columns with >5 tokens)
    DISCOVERY_APPLY_METANAME_DICT_TO_UNSTRUCT: "true"
    

  4. Save the file and exit the editor.

Step 2: Restart Privacera Services

Bash
1
2
3
cd ~/privacera/privacera-manager
./privacera-manager.sh setup
./pm_with_helm.sh upgrade 

After restarting, metadata dictionaries will automatically be available for scanning unstructured content. For information on creating classification rules, refer to the Classification Rules documentation.


For more details on how this feature works and best practices, refer to the Using Metadata Dictionaries in Unstructured Content Scanning documentation.