Privacera Plugin in Dataproc
This section covers how you can use Privacera Manager to generate the setup script and Dataproc custom configuration to install Privacera Plugin in the GCP Dataproc environment.
Prerequisites
Ensure the following prerequisites are met:
-
A working Dataproc environment.
-
Privacera services must be up and running.
Configuration
-
SSH to the instance where Privacera is installed.
-
Run the following command:
cd ~/privacera/privacera-manager cp config/sample-vars/vars.dataproc.yml config/custom-vars/ vi config/custom-vars/vars.dataproc.yml
-
Edit the following properties:
Property Description Example DATAPROC_ENABLE Enable Dataproc template creation. true DATAPROC_MANAGE_INIT_SCRIPT Set this property to upload the init script to GCP Cloud Storage.
If the value is set to
true
, then Privacera will upload the init script to the GCP bucket.If the value is set to
false
, then manually upload the init script to a GCP bucket.false DATAPROC_PRIVACERA_GS_BUCKET Enter the GCP bucket name where the init script will be uploaded. gs://privacera-bucket DATAPROC_RANGER_IS_FALLBACK_SUPPORTED Use the property to enable/disable the fallback behavior to the privacera_files and privacera_hive services. It confirms whether the resources files should be allowed/denied access to the user.
To enable the fallback, set to true; to disable, set to false.
true -
Run the update.
cd ~/privacera/privacera-manager ./privacera-manager.sh update
After the update is complete, the setup script
setup_dataproc.sh
and Dataproc custom configurationsprivacera_cust_conf.zip
will be generated at the path,~/privacera/privacera-manager/output/dataproc
. -
If
DATAPROC_MANAGE_INIT_SCRIPT
is set tofalse
, then copy setup_dataproc.sh and privacera_cust_conf.zip. Both the files should be placed under the same folder.cd ~/privacera/privacera-manager/output/dataproc GS_BUCKET=<PLEASE_CHANGE> gsutil cp setup_dataproc.sh gs://${GS_BUCKET}/privacera/dataproc/init/ gsutil cp privacera_cust_conf.zip gs://${GS_BUCKET}/privacera/dataproc/init/
-
SSH to the instance where the master node of the Dataproc is installed. Then, enter the GCP bucket name and run the setup script.
sudo su - mkdir -p /opt/privacera/downloads cd /opt/privacera/downloads GS_BUCKET=privacera-dev gsutil cp gs://${GS_BUCKET}/privacera/dataproc/init/setup_dataproc.sh . chmod +x setup_dataproc.sh ./setup_dataproc.sh