Skip to content

Setup for Access Management for Databricks all-purpose compute clusters with Object-Level Access Control (OLAC)

Configure

Perform following steps to configure Databricks OLAC:

  1. SSH to the instance where Privacera Manager is installed.

  2. Run the following command to navigate to the /config directory and copy yml files:

    Bash
    1
    2
    3
    cd ~/privacera/privacera-manager/config
    cp sample-vars/vars.databricks.scala.yml custom-vars/vars.databricks.scala.yml
    cp -n sample-vars/vars.databricks.plugin.yml custom-vars/vars.databricks.plugin.yml
    

  3. Modify the following properties:
    • No modification required in vars.databricks.scala.yml.
    • In vars.databricks.plugin.yml file, update the following properties with the appropriate values:
      Bash
      DATABRICKS_HOST_URL: "<PLEASE_UPDATE>"
      DATABRICKS_TOKEN: "<PLEASE_UPDATE>"
      
  4. Once the properties are configured, Run Privacera Manager post-install action Refer this

Enable Databricks Application

  1. In PrivaceraCloud, go to Settings -> Applications.
  2. On the Applications screen, select Databricks.
  3. Select Platform Type
  4. Enter the application Name and Description. Click Save. Name could be any name of your choice. E.g. Databricks.
  5. Open the Databricks application.
  6. Enable the Access Management option with toggle button.
  7. Click on save button

Download Script from PrivaceraCloud

  1. Open the Databricks application.
  2. Click on Access Management
  3. Click on Download Script
  4. Save script on local as privacera_databricks.sh

Create Init script in Databricks Workspace File

  1. Login Databricks Web UI
  2. Click on Workspace sidebar
  3. Clck on Workspace folder
  4. Click on Create -> Folder
  5. Give name privacera
  6. Click on Create
  7. Go inside privacera folder
  8. Click on Create -> File
  9. Fill New File Name privacera_databricks.sh
  10. Copy content of local privacera_databricks.sh and paste in workspace file privacera_databricks.sh

Create Databricks cluster policy

We recommend to use Databricks cluster policy to control cluster configuration. Here are the steps to create cluster policy:

  • Log in to Databricks Web UI
  • Click on the Compute icon on the sidebar
  • Click on the Policies tab
  • Click on the Create policy button
  • Provide a name to the policy i.e privacera-olac-cluster-policy
  • Add below policy definition, and replace with your actual instance-profile-arn
    Bash
    {
      "spark_conf.spark.databricks.isv.product": {
        "type": "fixed",
        "value": "privacera"
      },
      "spark_conf.spark.driver.extraJavaOptions": {
        "type": "fixed",
        "value": "-javaagent:/databricks/jars/privacera-agent.jar"
      },
      "spark_conf.spark.executor.extraJavaOptions": {
        "type": "fixed",
        "value": "-javaagent:/databricks/jars/privacera-agent.jar"
      },
      "spark_conf.spark.databricks.repl.allowedLanguages": {
        "type": "fixed",
        "value": "sql,python,r,scala"
      },
      "spark_conf.spark.databricks.delta.formatCheck.enabled": {
        "type": "fixed",
        "value": "false"
      },
      "aws_attributes.instance_profile_arn": {
        "type": "fixed",
        "value": "<instance-profile-arn>"
      },
      "spark_env_vars.PRIVACERA_PLUGIN_TYPE": {
        "type": "fixed",
        "value": "OLAC"
      }
    }
    

instance-profile-arn is Optional

This IAM role should not have access to the S3 buckets which are managed by Privacera Dataserver.

Create Databricks cluster

Here are the steps to create databricks cluster with Privacera plugin (OLAC):

Cluster name restrictions

Cluster names can use underscores (_) for separation in name. Avoid using other special characters and spaces.

  1. Log in to Databricks Web UI
  2. Click on the Compute icon on the sidebar
  3. Click the Create Compute button.
  4. Fill in the cluster configuration details.
  5. Under the Cluster Policies dropdown which we created in previous step (i.e privacera-olac-cluster-policy)
  6. Under the Advanced option :
    • Select the source as Workspace for init scripts.
    • Specify the Workspace file path:
      • Self Managed and Data Plane
        • /privacera/{DEPLOYMENT_ENV_NAME}/ranger_enable_scala.sh
      • PrivaceraCloud
        • /privacera/{DEPLOYMENT_ENV_NAME}/privacera_databricks.sh
  7. Click on the Add button.
  8. Click Create Compute to create the cluster.

Validation

To confirm the successful association of an access management policy to data in your Databricks installation, follow these steps:

Ranger Policy Repo

It will use privacera_s3 repo for access control

  1. Prerequisites:

    • A running Databricks cluster secured as described in the previous steps.
    • At least one resource policy associated with your data that grants a user access to the S3 path.
    • Sample Resource Policy:

      • Policy Name: sample_csv_read
      • Bucket Name: infraqa-test
      • Object Path: file data/data/format=csv/sample/sample.csv
      • Permissions: READ
      • Select User: emily

      Sample resource policy

  2. Steps to Validate Policy:

    • Log in to Databricks as a user defined in the resource policy.
    • Create a new notebook or open an existing one, and associate it with the running Databricks cluster.
    • In the notebook: Run the following scala command to read csv file from S3:
      Scala
      spark.read.csv("s3://<bucket-name>/<file-path>/sample.csv")
      
    • On the Privacera portal, navigate to Access Management -> Audits
    • Review the results to determine the success or failure of the resource policy. A successful outcome indicates that the policy was enforced correctly.
    • Sample Audit Logs:

      Sample audit logs

Comments