Configuring Databricks FGAC with Custom Service Repository¶

Prerequisites¶

Create a new Service repo in Privacera portal with the name you want to use in Databricks.

Given below is an example of creating a custom service repo for S3.

Let’s assume you want to create a new service repo with the prefix as “dev”. Perform the following steps to create a custom s3 Ranger policy repo. Follow the same steps to add other custom services for Hive, Files, Adls, etc.

Login to Privacera portal.
Go to Access Management -> Resource Policies.
Under s3, click the more icon .
Select Add Service.
Under Add Service, provide values for the following fields:
Service Name: Provide name for the service. For example, 'dev_s3'.
Click the toggle to turn on the Active Status.
Under Select Tag Service, select 'privacera_tag' from the drop-down list.
Provide username as 's3'.
Provide Common Name for Certificate as 'Ranger'.
Click SAVE.

Updating Custom Repo Name in Databricks¶

Self Managed and Data PlanePrivaceraCloud

Update the vars.databricks.plugin.yml file:

SSH to the instance where Privacera Manager is installed.
Run the following command to navigate to the /custom-vars directory.
Bash
1
cd ~/privacera/privacera-manager/config/custom-vars
Open the vars.databricks.plugin.yml file.
Bash
1
vi vars.databricks.plugin.yml
Uncomment the DATABRICKS_SERVICE_NAME_PREFIX property and update it with your custom service name prefix.
Bash
1
DATABRICKS_SERVICE_NAME_PREFIX: "dev"
Once the property is configured, run the following commands to generate and upload the configuration
Bash
1 2 3
cd ~/privacera/privacera-manager ./privacera-manager.sh setup ./pm_with_helm.sh upgrade

Run the following command to run the post install steps:

Bash
cd ~/privacera/privacera-manager
./privacera-manager.sh post-install

- Use the updated ranger_enable.sh script in the Databricks cluster creation. - Click on Start or, if the cluster is running, click on Confirm and Restart.

There are three ways to include the custom repository name. You can choose any one of the following methods:

Update the privacera_databricks.sh (init script):
- Open the privacera_databricks.sh script.
- Add the following line after API_SERVER_URL="https://xxxxxxxx/api" to include the custom repository name:
  Bash
  1
  export SERVICE_NAME_PREFIX=dev
- Save the file and use it in Databricks cluster creation.
- Click on Start or, if the cluster is running, click on Confirm and Restart.
Set an Environment Variable at the Databricks Cluster Level:
- Log in to the Databricks workspace.
- Navigate to the cluster configuration.
- Click on Edit -> Advanced options.
- Click on the Spark tab and add the following property in Environment variables:
  Bash
  1
  SERVICE_NAME_PREFIX=dev
- Save and click on Start or, if the cluster is running, click on Restart.
Set an Environment Variable in the Databricks Cluster Policy:
- Create or update an existing Databricks cluster policy using the following json block:
  JSON
  1 2 3 4
  "spark_env_vars.SERVICE_NAME_PREFIX": { "type": "fixed", "value": "dev" }
- Create or update a cluster with the above policy to set the environment variable on the cluster.
- Set the Spark configuration as done in step 2.
- Save and click on Start or, if the cluster is running, click on Confirm and Restart.

When the custom service repo is not defined using any of these methods, the plugin will by default use the service repos starting with privacera

Validation/Verification¶

To confirm the successful association of the custom S3 service repo, perform the following steps. The steps are similar for other services like Hive, Files, ADLS, etc.:

Prerequisites:
1. A running Databricks cluster secured using the above steps.

Steps to Validate:

Login to Databricks.
Create or open an existing notebook. Associate the Notebook with the running Databricks cluster.

Use the following PySpark commands to verify read access to an S3 CSV file.

Python
# Define the S3 path to your file
s3_path = "s3a://your-bucket-name/path/to/your/file"

# Read the CSV file from the specified S3 path
df = spark.read.format("csv").option("header", "true").load(s3_path)

# Display the first 5 rows of the dataframe
df.show(5)

On the Privacera portal, go to Access Management -> Audits
Check for the Service Name that you mentioned when Creating a Service repo, e.g., dev_s3.
Check for the success or failure of the resource policy. A successful access is indicated as Allowed and a failure is indicated as Denied.

Prev topic: Advanced Configuration