Skip to content

Privacera Plugin in Dataproc

This section covers how you can use Privacera Manager to generate the setup script and Dataproc custom configuration to install Privacera Plugin in the GCP Dataproc environment.

Prerequisites

Ensure the following prerequisites are met:

  • A working Dataproc environment.

  • Privacera services must be up and running.

Configuration

  1. SSH to the instance where Privacera is installed.

  2. Run the following command:

    cd ~/privacera/privacera-manager
    cp config/sample-vars/vars.dataproc.yml config/custom-vars/
    vi config/custom-vars/vars.dataproc.yml
    
  3. Edit the following properties:

    Property Description Example
    DATAPROC_ENABLE Enable Dataproc template creation. true
    DATAPROC_MANAGE_INIT_SCRIPT

    Set this property to upload the init script to GCP Cloud Storage.

    If the value is set to true, then Privacera will upload the init script to the GCP bucket.

    If the value is set to false, then manually upload the init script to a GCP bucket.

    false
    DATAPROC_PRIVACERA_GS_BUCKET Enter the GCP bucket name where the init script will be uploaded. gs://privacera-bucket
    DATAPROC_RANGER_IS_FALLBACK_SUPPORTED

    Use the property to enable/disable the fallback behavior to the privacera_files and privacera_hive services. It confirms whether the resources files should be allowed/denied access to the user.

    To enable the fallback, set to true; to disable, set to false.

    true

  4. Run the update.

    cd ~/privacera/privacera-manager
    ./privacera-manager.sh update
    

    After the update is complete, the setup script setup_dataproc.sh and Dataproc custom configurations privacera_cust_conf.zip will be generated at the path, ~/privacera/privacera-manager/output/dataproc.

  5. If DATAPROC_MANAGE_INIT_SCRIPT is set to false, then copy setup_dataproc.sh and privacera_cust_conf.zip. Both the files should be placed under the same folder.

    cd ~/privacera/privacera-manager/output/dataproc
    GS_BUCKET=<PLEASE_CHANGE>
    gsutil cp setup_dataproc.sh gs://${GS_BUCKET}/privacera/dataproc/init/
    gsutil cp privacera_cust_conf.zip gs://${GS_BUCKET}/privacera/dataproc/init/
    
  6. SSH to the instance where the master node of the Dataproc is installed. Then, enter the GCP bucket name and run the setup script.

    sudo su - 
    mkdir -p /opt/privacera/downloads
    cd /opt/privacera/downloads
    GS_BUCKET=privacera-dev
    gsutil cp gs://${GS_BUCKET}/privacera/dataproc/init/setup_dataproc.sh .
    chmod +x setup_dataproc.sh
    ./setup_dataproc.sh