Skip to content

Setup for Access Management for Databricks all-purpose compute clusters with Fine-Grained Access Control (FGAC)

Configure

Perform the following steps to configure Databricks FGAC:

  1. SSH to the instance where Privacera Manager is installed.

  2. Run the following command to navigate to the /config directory.

    Bash
    cd ~/privacera/privacera-manager/config
    

  3. Run the following command to copy the databricks FGAC plugin yml file from sample-vars, if it's not already present in custom-vars:

    Bash
    cp -n sample-vars/vars.databricks.plugin.yml custom-vars/vars.databricks.plugin.yml
    

  4. Update the following properties in the vars.databricks.plugin.yml file:

    Bash
    1
    2
    3
    4
    vi custom-vars/vars.databricks.plugin.yml
    
    DATABRICKS_HOST_URL: "<PLEASE_UPDATE>"
    DATABRICKS_TOKEN: "<PLEASE_UPDATE>"
    

  5. Once the properties are configured, run the Privacera Manager post-install action Refer this

Enable Databricks Application

  1. In PrivaceraCloud, go to Settings -> Applications.
  2. On the Applications screen, select Databricks.
  3. Select Platform Type
  4. Enter the application Name and Description. Click Save. The name can be any name of your choice, e.g., Databricks.
  5. Open the Databricks application.
  6. Enable the Access Management option with toggle button.
  7. Click on save button

Download Script from PrivaceraCloud

  1. Open the Databricks application.
  2. Click on Access Management
  3. Click on Download Script
  4. Save the script locally as privacera_databricks.sh

Create Init script in Databricks Workspace File

  1. Login Databricks Web UI
  2. Click on the Workspace sidebar
  3. Click on the Workspace folder
  4. Click on Create -> Folder
  5. Name the folder privacera
  6. Click on Create
  7. Go inside the privacera folder
  8. Click on Create -> File
  9. Enter the new file name as privacera_databricks.sh
  10. Copy the content of the local privacera_databricks.sh and paste it into the workspace file privacera_databricks.sh

Create Databricks cluster policy

We recommend to use Databricks cluster policy to control cluster configuration. Here are the steps to create cluster policy:

  • Log in to Databricks Web UI
  • Click on the Compute icon on the sidebar
  • Click on the Policies tab
  • Click on the Create policy button
  • Provide a name to the policy i.e privacera-fgac-cluster-policy
  • Add below policy definition, and replace with your actual instance-profile-arn
    Bash
    {
      "spark_conf.spark.databricks.isv.product": {
        "type": "fixed",
        "value": "privacera"
      },
      "spark_conf.spark.driver.extraJavaOptions": {
        "type": "fixed",
        "value": "-javaagent:/databricks/jars/privacera-agent.jar"
      },
      "spark_conf.spark.databricks.repl.allowedLanguages": {
        "type": "fixed",
        "value": "sql,python,r"
      },
      "spark_conf.spark.databricks.delta.formatCheck.enabled": {
        "type": "fixed",
        "value": "false"
      },
      "aws_attributes.instance_profile_arn": {
        "type": "fixed",
        "value": "<instance-profile-arn>"
      },
      "spark_conf.spark.sql.extensions": {
        "type": "fixed",
        "value": "com.privacera.spark.agent.SparkSQLExtension"
      },
      "spark_conf.spark.databricks.pyspark.dbconnect.enableProcessIsolation": {
        "type": "fixed",
        "value": "true"
      },
      "spark_conf.spark.databricks.pyspark.enableProcessIsolation": {
        "type": "fixed",
        "value": "true"  
      },
      "spark_conf.spark.databricks.pyspark.enablePy4JSecurity": {
        "type": "fixed",
        "value": "true"
      },
      "spark_conf.spark.hadoop.privacera.custom.current_user.udf.names": {
        "type": "fixed",
        "value": "current_user()"
      }
    }
    

Create Databricks cluster

Here are the steps to create Databricks cluster with Privacera plugin (FGAC):

  1. Log in to Databricks Web UI
  2. Click on the Compute icon on the sidebar
  3. Click the Create Compute button.
  4. Fill in the cluster configuration details.
  5. Under the Cluster Policies dropdown, select the policy you created in the previous step (i.e., privacera-fgac-cluster-policy)
  6. Under the Advanced option :
    • Select Workspace as the source for init scripts.
    • Specify the Workspace file path:
      • Self Managed and Data Plane
        • /privacera/{DEPLOYMENT_ENV_NAME}/ranger_enable.sh
      • PrivaceraCloud
        • /privacera/{DEPLOYMENT_ENV_NAME}/privacera_databricks.sh
  7. Click the Add button.
  8. Click Create Compute to create the cluster.

Validation

To confirm the successful association of an access management policy with data in your Databricks installation, follow these steps:

  1. Prerequisites:

    • A running Databricks cluster secured using the steps mentioned above.
    • At least one resource policy associated with your data that grants a user access to the database.
    • This resource policy must not be for Databrick's default database. Configure the policy for any database other than the default.
    • Sample Resource Policy:

      • Policy Name: Employee Details
      • Hive Database: sample_db
      • Hive Table: employees
      • Permissions: CREATE, UPDATE, SELECT, DROP
      • Select User: emily

      Sample resource policy

  2. Steps to Validate Policy:

    • Login to Databricks as a user who is defined in the resource policy.
    • Create or open an existing notebook. Associate the Notebook with the running Databricks cluster.
    • Select the database to which you have associated the policy.
    • In the notebook, run following SQL commands to Create a sample table, Insert and Select the data:
      SQL
      USE sample_db;
      
      CREATE TABLE IF NOT EXISTS employees (
      Emp_Id INT,
      First_name STRING,
      Last_name STRING
      );
      
      INSERT INTO employees (Emp_Id, First_name, Last_name) VALUES (1, 'John', 'Doe'), 
      (2, 'Jane', 'Smith'), (3, 'Alice', 'Johnson');
      
      SELECT * FROM employees;
      
    • On Privacera portal, go to Access Management -> Audits
    • Check for the success or failure of the resource policy. A successful access is indicated as Allowed and failure is indicated as Denied.
    • Sample Audit Logs:

      Sample audit logs

Comments