Skip to content


Databricks Spark Plug-in (Python/SQL)#

These instructions guide the installation of the Privacera Spark plug-in in GCP Databricks.


Ensure the following prerequisite is met:

  • All the Privacera core (default) services should be installed and running.


  1. Run the following commands.

    cd ~/privacera/privacera-manager
    cp config/sample-vars/vars.databricks.plugin.yml config/custom-vars/
    vi config/custom-vars/vars.databricks.plugin.yml
  2. Update DATABRICKS_MANAGE_INIT_SCRIPT as we will manually upload the init script to GCP Cloud Storage in the step below.

  3. Run the following commands.

    cd ~/privacera/privacera-manager
    ./ update

    After the update is completed, the init script ( and Privacera custom configuration ( for SSL will be generated at the location,~/privacera/privacera-manager/output/databricks.

Custom Configuration File

(Recommended) Perform the following steps only if you have https enabled for Ranger:

Upload the to a storage bucket in GCP and copy the public URL. For example,${PUBLIC_GCS_BUCKET}/, where ${PUBLIC_GCS_BUCKET} is the GCP bucket name. We will use this URL in the init script to download to the Databricks cluster.

Managing init Script and Spark Configurations

  1. Run the following command.

    cd ~/privacera/privacera-manager/output/databricks
  2. In the CUST_CONF_URL property, add the public URL of the GCP storage bucket where you placed the

  3. Upload the init script,, to your Google Cloud Storage account and copy the file path of the script. For example, gs://privacera/dev/init/

  4. Log on to the Databricks console with your account and open the target cluster or create a new cluster.

  5. Open the Cluster dialog and go to Edit mode.

  6. Open Advanced Options, open the tab Init Scripts. Enter (paste) the file path from step 3 for the init script location. Save (Confirm) this configuration.

  7. Open Advanced Options, open the tab Spark. Add the following content to the Spark Config edit box:

    spark.driver.extraJavaOptions -javaagent:/databricks/jars/privacera-agent.jar
    spark.databricks.isv.product privacera
    spark.databricks.pyspark.enableProcessIsolation false
    privacera.spark.view.levelmaskingrowfilter.extension.enable true
  8. Save (Confirm) this configuration.

  9. Start (or Restart) the selected Databricks Cluster.


In order to help evaluate the use of Privacera with Databricks, Privacera provides a set of Privacera Manager 'demo' notebooks. These can be downloaded from Privacera S3 repository using either your favorite browser, or a command line 'wget'. Use the notebook/sql sequence that matches your cluster.

  1. Download using your browser (just click on the correct file for your cluster, below:

    If AWS S3 is configured from your Databricks cluster:

    If ADLS Gen2 is configured from your Databricks cluster:

    or, if you are working from a Linux command line, use the 'wget' command to download.

    wget -O PrivaceraSparkPlugin.sql

    wget -O PrivaceraSparkPluginS3.sql

    wget -O PrivaceraSparkPluginADLS.sql

  2. Import the Databricks notebook:

    • Login to Databricks Console ->
    • Select Workspace -> Users -> Your User ->
    • Click on drop down ->
    • Click on Import and Choose the file downloaded
  3. Follow the suggested steps in the text of the notebook to exercise and validate Privacera with Databricks.