Configure Databricks Spark Fine-Grained Access Control Plugin [FGAC] [Python, SQL]

Perform following steps to configure the Databricks Spark Fine-Grained Access Control Plugin (FGAC):

Run the following commands:

cd ~/privacera/privacera-manager
cp config/sample-vars/vars.databricks.plugin.yml config/custom-vars/
vi config/custom-vars/vars.databricks.plugin.yml

Edit the following properties to allow Privacera Platform to connect to your Databricks host. For property details and description, refer to the Configuration Properties below.

DATABRICKS_HOST_URL: "<PLEASE_UPDATE>"
DATABRICKS_TOKEN: "<PLEASE_UPDATE>"
DATABRICKS_WORKSPACES_LIST:
- alias: DEFAULT
databricks_host_url: "{{DATABRICKS_HOST_URL}}"
token: "{{DATABRICKS_TOKEN}}"
DATABRICKS_MANAGE_INIT_SCRIPT: "true"
DATABRICKS_ENABLE: "true"

You can also add custom properties that are not included by default. .

Optional: To use a Databricks workspace file, set the following property to true.
```
DATABRICKS_INIT_SCRIPT_WORKSPACE_FLAG_ENABLE: "true"
```

Run the following commands:

cd ~/privacera/privacera-manager
./privacera-manager.sh update

(Optional) By default, policies under the default service name, privacera_hive are enforced. You can customize a different service name and enforce policies defined in the new name. See Configure service name for Databricks Spark plugin on Privacera Platform.

Databricks FGAC configuration properties

Note

For presentation in the following table, the variable names in the first column are shown broken across lines; however, the variable names are actually all one string, including underscores with no spaces.

Property Name	Description
`DATABRICKS_ HOST_ URL`	Enter the URL where the Databricks environment is hosted.
`DATABRICKS_ TOKEN`	Enter the Databricks token.
`DATABRICKS_ WORKSPACES_ LIST`	Add multiple Databricks workspaces to connect to Ranger. To add a single workspace, add the following default JSON in the text area to define the host URL and token of the Databricks workspace. The text area should not be left empty and should at least contain the default JSON. Do not edit any of the values in the default JSON. - alias: DEFAULT databricks_host_url: "{{DATABRICKS_HOST_URL}}" token: "{{DATABRICKS_TOKEN}}" To add two workspaces, use the following JSON. `{{var}}` is an Ansible variable. Such a variable re-uses the value of a predefined variable. Do not change the lowercase property names `databricks_host_url` and `token`. - alias: DEFAULT databricks_host_url: "{{DATABRICKS_HOST_URL}}" token: "{{DATABRICKS_TOKEN}}" - alias: "<workspace-2-alias>" databricks_host_url: ">workspace-2-url<" token: "<dbx-token-for-workspace-2>"
`DATABRICKS_ ENABLE`	If set to 'true' Privacera Manager will create the Databricks cluster Init s `~/privacera/privacera-manager/output/databricks/ranger_enable.sh`
`DATABRICKS_ INIT_ SCRIPT_ WORKSPACE_ FLAG_ ENABLE`	Set to 'true' for uploading init script into workspace files at location `/privacera/<DEPLOYMENT_ENV_NAME>/ranger_enable.sh`
`DATABRICKS_ MANAGE_ INIT_ SCRIPT`	If set to 'true' Privacera Manager will upload Init script `ranger_ enable.sh` to the identified Databricks Host. If set to 'false' upload the following two files to the DBFS location. The files are at `~/privacera/privacera-manager/output/databricks`. `privacera_ spark_ plugin_ job.conf` `privacera_ spark_ plugin.conf`
`DATABRICKS_ SPARK_ PLUGIN_ AGENT_ JAR`	Use the Java agent to assign a string of extra JVM options to pass to the Spark driver. Example: `-javaagent:/databricks/jars/privacera-agent.jar`
`DATABRICKS_ SPARK_ PRIVACERA_ CUSTOM_ CURRENT_ USER_ UDF_ NAME`	Property to map logged-in user to Ranger user for row-filter policy. Example: `current_user()` See Spark FGAC properties. Check if this property is set in your Databricks cluster. If it is being used, then set its value similar to the PM property. If the value of the PM property and Databricks cluster-level property differ, then it can cause an unexpected behavior.
`DATABRICKS_ SPARK_ PRIVACERA_ VIEW_ LEVEL_ MASKING_ ROWFILTER_ EXTENSION_ ENABLE`	Property to enable masking, row-filter and data_ admin access on view. This property is a Privacera Manager (PM) property. Default: true It is mapped to the Databricks cluster-level property `spark.hadoop.privacera.spark.view.levelmaskingrowfilter.extension.enable` See Spark FGAC properties. Check if this property is set in your Databricks cluster. If it is being used, then set its value similar to the PM property. If the value of the PM property and Databricks cluster-level property differ, then it can cause an unexpected behavior.
`DATABRICKS_ SQL_ CLUSTER_ POLICY_ SPARK_ CONF`	Configure Databricks Cluster policy. Add the following JSON in the text area: [ { Note":"First spark conf", "key":"spark.hadoop.first.spark.test", "value":"test1" }, { "Note":"Second spark conf", "key":"spark.hadoop.first.spark.test", "value":"test2" } ]
`DATABRICKS_ POST_ PLUGIN_ COMMAND_ LIST`	This property is not included in the default YAML file, but can be added if required. Use this property if you want to run a specific set of commands in the Databricks init script. The following example is added to the cluster init script to allow Athena JDBC via data access server: sudo iptables -I OUTPUT 1 -p tcp -m tcp --dport 8181 -j ACCEPT ; sudo curl -k -u user:password {{PORTAL_URL}}/api/dataserver/cert?type=dataserver_jks \ -o /etc/ssl/certs/dataserver.jks sudo chmod 755 /etc/ssl/certs/dataserver.jks
`DATABRICKS_ SPARK_ PYSPARK_ ENABLE_ PY4J_ SECURITY`	With this property you blacklist Pyspark APIs to enable security. This is a Privacera Manager (PM) property It is mapped with the Databricks cluster-level property `spark.databricks.pyspark.enablePy4JSecurity`. See Spark FGAC properties. Check if this property is set in your Databricks cluster. If it is being used, then set its value similar to the PM property. If the value of the PM property and Databricks cluster-level property differ, then it can cause an unexpected behavior.

Managing init script

Automatic upload

If DATABRICKS_ENABLE is 'true' and DATABRICKS_MANAGE_INIT_SCRIPT is 'true', then the Init script will be uploaded automatically to your Databricks host. The init script will be uploaded to dbfs:/privacera/<DEPLOYMENT_ENV_NAME>/ranger_enable.sh where <DEPLOYMENT_ENV_NAME> is the value of DEPLOYMENT_ENV_NAME mentioned in vars.privacera.yml.

Manual upload

If DATABRICKS_ENABLE is 'true' and DATABRICKS_MANAGE_INIT_SCRIPT is 'false', then the Init script must be uploaded to your Databricks host.

To avoid the manual steps below, you should set DATABRICKS_MANAGE_INIT_SCRIPT=true and follow the instructions outlined in Automatic Upload.

Open a terminal and connect to Databricks account using your Databricks login credentials/token.
Connect using login credentials:
1. If you're using login credentials, then run the following command:
```
databricks configure --profile privacera
```
2. Enter the Databricks URL:
```
Databricks Host (should begin with https://): https://dbc-xxxxxxxx-xxxx.cloud.databricks.com/
```
3. ```
Enter the username and password:
Username: email-id@example.com
Password:
```
Connect using Databricks token:
1. If you don't have a Databricks token, you can generate one.
2. If you're using token, then run the following command:
```
databricks configure --token --profile privacera
```
3. Enter the Databricks URL:
```
Databricks Host (should begin with https://): https://dbc-xxxxxxxx-xxxx.cloud.databricks.com/
```
4. Enter the token:
```
Token:
```
To check if the connection to your Databricks account is established, run the following command:
```
dbfs ls dbfs:/ --profile privacera
```
You should see the list of files in the output, if you are connected to your account.
Upload files manually to Databricks:
1. Copy the following files to DBFS, which are available in the PM host at the location, ~/privacera/privacera-manager/output/databricks:
  - ranger_enable.sh
  - privacera_spark_plugin.conf
  - privacera_spark_plugin_job.conf
  - privacera_custom_conf.zip
2. Run the following command. For the value of <DEPLOYMENT_ENV_NAME>, you can get it from the file, ~/privacera/privacera-manager/config/vars.privacera.yml.
```
export DEPLOYMENT_ENV_NAME=<DEPLOYMENT_ENV_NAME>
dbfs mkdirs dbfs:/privacera/${DEPLOYMENT_ENV_NAME} --profile privacera
dbfs cp ranger_enable.sh dbfs:/privacera/${DEPLOYMENT_ENV_NAME}/ --profile privacera
dbfs cp privacera_spark_plugin.conf dbfs:/privacera/${DEPLOYMENT_ENV_NAME}/ --profile privacera
dbfs cp privacera_spark_plugin_job.conf dbfs:/privacera/${DEPLOYMENT_ENV_NAME}/ --profile privacera
dbfs cp privacera_custom_conf.zip dbfs:/privacera/${DEPLOYMENT_ENV_NAME}/ --profile privacera
```
3. Verify the files have been uploaded.
```
dbfs ls dbfs:/privacera/${DEPLOYMENT_ENV_NAME}/ --profile privacera
```
  The Init Script will be uploaded to dbfs:/privacera/<DEPLOYMENT_ENV_NAME>/ranger_enable.sh, where <DEPLOYMENT_ENV_NAME> is the value of DEPLOYMENT_ENV_NAME mentioned in vars.privacera.yml.

Configure Databricks Cluster

Once the update completes successfully, log on to the Databricks console with your account and open the target cluster, or create a new target cluster.
Open the Cluster dialog and enter Edit mode.
In the Configuration tab, select Advanced Options > Spark.

Add the following content to the Spark Config edit box. For more information on the Spark config properties, see Spark FGAC properties.

spark.databricks.cluster.profile serverless
spark.databricks.isv.product privacera
spark.driver.extraJavaOptions -javaagent:/databricks/jars/privacera-agent.jar
spark.databricks.repl.allowedLanguages sql,python,r

Under Configuration, click Edit. Select Advanced and choose one of the following options to set the associated init script path.
Type
File path
Workspace
/privacera/<DEPLOYMENT_ENV_NAME>/ranger_enable.sh
dbfs
dbfs:/privacera/<DEPLOYMENT_ENV_NAME>/ranger_enable.sh
For the <DEPLOYMENT_ENV_NAME> variable, enter the deployment name as defined for the DEPLOYMENT_ENV_NAME variable in the vars.privacera.yml file.
Note
Databricks has deprecated dbfs and encourages the use of Workspace files. To avoid warnings from Databricks, Privacera recommends that you use the Workspace option.
In the Table Access Control section, uncheck Enable table access control and only allow Python and SQL commands and Enable credential passthrough for user-level data access and only allow Python and SQL commands checkboxes.
Save (Confirm) this configuration.
Start (or Restart) the selected Databricks Cluster.

Type	File path
Workspace	`/privacera/<DEPLOYMENT_ENV_NAME>/ranger_enable.sh`
dbfs	`dbfs:/privacera/<DEPLOYMENT_ENV_NAME>/ranger_enable.sh`

Tip

To enable view-level access control (via Data-Admin), and view-level row-level filtering and column masking, add the property DATABRICKS_SPARK_PRIVACERA_VIEW_LEVEL_MASKING_ROWFILTER_EXTENSION_ENABLE: "true" in custom-vars. Search for this property in Spark FGAC properties for more information.
By default, certain python packages are blocked on the Databricks cluster for security compliance. If you still wish to use these packages, see Whitelist py4j security manager via S3 or DBFS.
If you want to enable JWT-based user authentication for your Databricks clusters, see Configure JSON Web Tokens for Databricks.
If you want PM to add cluster policies in Databricks, see Configure Databricks cluster policy.
If you want to add additional Spark properties for your Databricks cluster, see Add custom Spark configuration for Databricks on Privacera Platform.

Validation

In order to help evaluate the use of Privacera with Databricks, Privacera provides a set of Privacera Manager 'demo' notebooks. These can be downloaded from Privacera S3 repository using either your favorite browser, or a command line wget. Use the notebook/sql sequence that matches your cluster.

Download using your browser (just click on the correct file for your cluster, below:
- https://privacera.s3.amazonaws.com/public/pm-demo-data/databricks/PrivaceraSparkPlugin.sql
- If AWS S3 is configured from your Databricks cluster: https://privacera.s3.amazonaws.com/public/pm-demo-data/databricks/PrivaceraSparkPluginS3.sql
- If ADLS Gen2 is configured from your Databricks cluster: https://privacera.s3.amazonaws.com/public/pm-demo-data/databricks/PrivaceraSparkPluginADLS.sql
- If you are working from a Linux command line, use the wget command to download:
```
wget https://privacera.s3.amazonaws.com/public/pm-demo-data/databricks/PrivaceraSparkPlugin.sql -O PrivaceraSparkPlugin.sql
wget https://privacera.s3.amazonaws.com/public/pm-demo-data/databricks/PrivaceraSparkPluginS3.sql -O PrivaceraSparkPluginS3.sql
wget https://privacera.s3.amazonaws.com/public/pm-demo-data/databricks/PrivaceraSparkPluginADLS.sql -O PrivaceraSparkPluginADLS.sql
```
Import the Databricks notebook:
1. Log in to the Databricks Console
2. Select Workspace > Users > Your User.
3. From the drop down menu, select Import and choose the file downloaded.
Follow the suggested steps in the text of the notebook to exercise and validate Privacera with Databricks.

Privacera Documentation

Table of ContentsTable of Contents

Configure Databricks Spark Fine-Grained Access Control Plugin [FGAC] [Python, SQL]

Databricks FGAC configuration properties

Note

Managing init script

Configure Databricks Cluster

Note

Tip

Validation