Skip to main content

Privacera Documentation

Connect GCP Dataproc to Privacera Platform using Privacera plugin

You can use Privacera Manager to generate the setup script and Dataproc custom configuration to install Privacera Plugin in a GCP Dataproc environment.

Prerequisites

Ensure the following prerequisites are met:

  • Your Dataproc environment is working

  • Your Privacera services are up and running

Procedure
  1. SSH to the instance where Privacera is installed.

  2. Run the following command:

    cd ~/privacera/privacera-manager
    cp config/sample-vars/vars.dataproc.yml config/custom-vars/
    vi config/custom-vars/vars.dataproc.yml                          
  3. Edit the following properties:

    Property

    Description

    Example

    DATAPROC_ENABLE

    Enable Dataproc template creation.

    true

    DATAPROC_MANAGE_INIT_SCRIPT

    Set this property to upload the init script to GCP Cloud Storage.

    If the value is set to true, then Privacera will upload the init script to the GCP bucket.

    If the value is set to false, then manually upload the init script to a GCP bucket.

    false

    DATAPROC_PRIVACERA_GS_BUCKET

    Enter the GCP bucket name where the init script will be uploaded.

    gs://privacera-bucket

    DATAPROC_RANGER_IS_FALLBACK_SUPPORTED

    Use the property to enable/disable the fallback behavior to the privacera_files and privacera_hive services. It confirms whether the resources files should be allowed/denied access to the user.

    To enable the fallback, set to true; to disable, set to false.

    true

  4. Run the update.

    cd ~/privacera/privacera-manager
    ./privacera-manager.sh update                           

    After the update is complete, the setup script setup_dataproc.sh and Dataproc custom configurations privacera_cust_conf.zip will be generated at the path, ~/privacera/privacera-manager/output/dataproc.

  5. If DATAPROC_MANAGE_INIT_SCRIPT is set to false, then copy setup_dataproc.sh and privacera_cust_conf.zip. Both the files should be placed under the same folder.

    cd ~/privacera/privacera-manager/output/dataproc
    GS_BUCKET=<PLEASE_CHANGE>
    gsutil cp setup_dataproc.sh gs://${GS_BUCKET}/privacera/dataproc/init/
    gsutil cp privacera_cust_conf.zip gs://${GS_BUCKET}/privacera/dataproc/init/                          
  6. SSH to the instance where the master node of the Dataproc is installed, enter the GCP bucket name, and run the setup script.

    sudo su - 
    mkdir -p /opt/privacera/downloads
    cd /opt/privacera/downloads
    GS_BUCKET=privacera-dev
    gsutil cp gs://${GS_BUCKET}/privacera/dataproc/init/setup_dataproc.sh .
    chmod +x setup_dataproc.sh
    ./setup_dataproc.sh