Skip to content

Configuration to Enable Part File Tag Propagation to Parent Folders

Privacera Discovery provides the capability to automatically propagate classification tags from scanned part files to their parent folders. This feature operates through a background service that must be explicitly enabled.

For detailed information on how the feature works, refer to the Part File Tag Propagation user guide.

Prerequisites

  • Discovery is installed and running. Refer Discovery installation steps
  • Data sources (S3, GCS, or ADLS) are configured

Configuration

Part file tag propagation must be enabled by setting the Discovery-level property. Once enabled, it works automatically for all filesystem-based data sources (S3, GCS, ADLS).

Enable the FolderLevelTagger Service

To customize tag propagation behavior:

  1. SSH into the instance where Privacera Manager is installed.
  2. Navigate to the privacera-manager directory:

    Bash
    cd ~/privacera/privacera-manager
    

  3. Add the following properties to the file below:

    Bash
    vi config/custom-vars/vars.discovery.aws.yml
    
    Bash
    vi config/custom-vars/vars.discovery.azure.yml
    
    Bash
    vi config/custom-vars/vars.discovery.gcp.yml
    

    Update the following variables:

    YAML
    # Enable the folder tagger background service for tag propagation
    DISCOVERY_FOLDER_TAGGER_ENABLE: "true"
    
    # Sleep time between batch processing in milliseconds (default: 10000)
    DISCOVERY_FOLDER_TAGGER_SLEEP_TIME_MS: "10000"
    
    # Number of resources to process in each batch (default: 100)
    DISCOVERY_FOLDER_TAGGER_BATCH_SIZE: "100"
    
    # Backoff time for Solr results to settle in seconds (default: 120)
    DISCOVERY_FOLDER_TAGGER_BACKOFF_TIME_SEC: "120"
    
    # Enable quick scan mode to sample part files (default: false)
    DISCOVERY_QUICK_SCAN_ENABLE: "true"
    
    # Number of part files to randomly sample per parent folder (default: 10)
    # Recommended values: 5-10 for fast scans, 20-50 for better accuracy
    DISCOVERY_QUICK_SCAN_LIMIT: "10"
    
  4. Save the file and exit the editor.

Optional: Performance Tuning

The following parameters can be adjusted for performance tuning (optional):

  • Sleep Time: Increase to reduce CPU usage; decrease for faster processing
  • Batch Size: Increase for higher throughput; decrease if you experience memory issues
  • Backoff Time: Increase if you see inconsistent tagging; decrease for faster updates

Restart Privacera Services

After enabling the service, restart Discovery:

Bash
1
2
3
cd ~/privacera/privacera-manager
./privacera-manager.sh setup
./pm_with_helm.sh upgrade