Advanced Configuration for Access Management for Databricks all-purpose compute clusters with Fine-Grained Access Control (FGAC)¶
JWT Auth Configuration¶
By Default, Privacera uses the Databricks login user for authorization. However, we also support JWT (JSON Web Token) integration, which will use the user/group from the JWT payload instead of the Databricks login user.
Here are the steps to configure JWT token integration
Configuration¶
Prerequisite:¶
- The username used in the client_id and the group names used in the scope payload of the JWT token should be created in the Users/Groups/Roles section of the Privacera Access Management Portal.
- These users or groups should be given the required permissions in the Ranger Policies for access control.
Need to set below common properties in Spark configuration of Databricks cluster:
Bash | |
---|---|
SSH to the instance where Privacera Manager is installed.
To enable JWT copy vars.jwt-auth.yaml from sample-vars to custom-vars
Bash | |
---|---|
Static public key JWT:¶
- Configure static public key
- Add below properties in vars.jwt-auth.yaml file. For a single static key, use only one entry in
JWT_CONFIGURATION_LIST
.For multiple static keys, add multiple entries: - Add public keys in JWT token files in
config/custom-properties
: - For a single static key, you only need to create one
jwttoken.pub
file. - Once the properties are configured, run the Privacera Manager setup and install actions Refer this
- Use the updated
ranger_enable.sh
script in Databricks cluster creation. - Click on Start or, if the cluster is running, click on Confirm and Restart.
- Add below properties in vars.jwt-auth.yaml file. For a single static key, use only one entry in
Dynamic Public Key JWT:¶
-
Configure Dynamic Public Key
-
Add the following properties to the
vars.jwt-auth.yaml
file: -
Once the properties are configured, run the following commands to generate and upload the configuration:
-
Use the updated
ranger_enable.sh
script in Databricks cluster creation. - Click on Start or, if the cluster is running, click on Confirm and Restart.
-
Static and Dynamic public keys JWT:¶
- Configure Static and Dynamic public keys
- Add below properties in vars.jwt-auth.yaml file:
- Add static JWT public key in jwt token file in config/custom-properties
- Once the property is configured, run the following commands to generate and upload the configuration:
- Use the updated
ranger_enable.sh
script in Databricks cluster creation. - Click on Start or, if the cluster is running, click on Confirm and Restart.
- Add below properties in vars.jwt-auth.yaml file:
Set the below common properties in the Spark configuration of the Databricks cluster:
Bash | |
---|---|
Static public key JWT:¶
-
Copy JWT Public Keys to Local Cluster File Path
- Upload the JWT Public Key:
- First, upload the
jwttoken.pub
file containing the JWT public key to the DBFS or workspace location. - For example, upload the key to
/dbfs/user/jwt/keys
.
- First, upload the
- Update the Init Script:
- To copy the public keys to the local cluster file path, update the init script with the following commands:
- This script sets the paths for the public keys in DBFS and the local cluster, then copies the keys from DBFS to the local path.
- Upload the JWT Public Key:
-
Configure single static public key
- Add below properties in the Spark configuration of the Databricks cluster along with the common properties:
Bash - Save the changes and click on Start or, if the cluster is running, click on Confirm and Restart.
- Add below properties in the Spark configuration of the Databricks cluster along with the common properties:
-
Configure multiple static public keys
- Add below properties in Spark configuration of Databricks cluster along with the common properties:
- Save the changes and click on Start or, if the cluster is running, click on Confirm and Restart.
- Add below properties in Spark configuration of Databricks cluster along with the common properties:
Dynamic public key JWT:¶
-
Configure single dynamic public key
- Add below properties in Spark configuration of Databricks cluster along with common properties:
- Save the changes and click on Start or, if the cluster is running, click on Confirm and Restart.
- Add below properties in Spark configuration of Databricks cluster along with common properties:
-
Configure multiple dynamic public keys
- Add below properties in the Spark configuration of Databricks cluster along with the common properties:
- Save the changes and click on Start or, if the cluster is running, click on Confirm and Restart.
- Add below properties in the Spark configuration of Databricks cluster along with the common properties:
Static and Dynamic public keys JWT:¶
- Configure static and dynamic public keys
- Add below properties in the Spark configuration of Databricks cluster along with the common properties:
- Save the changes and click on Start or, if the cluster is running, click on Confirm and Restart.
- Add below properties in the Spark configuration of Databricks cluster along with the common properties:
Validation¶
- Prerequisites:
- A running Databricks cluster secured with the above steps.
- Steps to Validate:
- Login to Databricks.
- Create or open an existing notebook. Associate the Notebook with the running Databricks cluster.
- To use JWT in Privacera Databricks integration, you need to copy the JWT token file or string to the cluster's local file. To do this, use the following commands and replace
<jwt_token>
with your actual jwt token value. - Use the following PySpark commands to verify S3 CSV file read access.
- On the Privacera portal, go to Access Management -> Audits
- Check for the User that you mentioned in the Payload while Creating the JWT Token, e.g.,
jwt_user
. - Check for the success or failure of the resource policy. A successful access is indicated as Allowed and a failure is indicated as Denied.
Use Custom Service repo¶
Creating a Service repo¶
We have to add these custom services outside the security zone. Inside the security zone, this will not work.
Let’s assume you want to create a new service repo with the prefix as “dev”. Perform the following steps to create a custom s3 Ranger policy repo. Follow the same steps to add other custom services for Hive, Files, Adls, etc.
- Login to Privacera portal.
- Go to Access Management -> Resource Policies.
- Under s3, click the more icon .
- Select Add Service.
- Under Add Service, provide values for the following fields:
- Service Name: Provide name for the service. For example, 'dev_s3'.
- Click the toggle to turn on the Active Status.
- Under Select Tag Service, select 'privacera_tag' from the drop-down list.
- Provide username as 's3'.
- Provide Common Name for Certificate as 'Ranger'.
- Click SAVE.
Updating Custom Repo Name in Databricks¶
There are two ways to include the custom repository name. You can choose either of the following methods:
-
Manually update the ranger_enable.sh (init script):
- Open the
ranger_enable.sh
script. - Update below property with the prefix you used for creating the new service repo. E.g.
dev
. By default, it isprivacera
Bash - Save the file and use it in the Databricks cluster creation.
- Click on Start or, if the cluster is running, click on Confirm and Restart.
- Open the
-
Update the vars.databricks.plugin.yml file:
- SSH to the instance where Privacera Manager is installed.
- Run the following command to navigate to the /custom-vars directory.
Bash - Open the vars.databricks.plugin.yml file.
Bash - Uncomment the DATABRICKS_SERVICE_NAME_PREFIX property and update it with your custom service name prefix.
Bash - Once the property is configured, run the following commands to generate and upload the configuration
- Use the updated
ranger_enable.sh
script in the Databricks cluster creation. - Click on Start or, if the cluster is running, click on Confirm and Restart.
There are three ways to include the custom repository name. You can choose any one of the following methods:
-
Update the privacera_databricks.sh (init script):
- Open the
privacera_databricks.sh
script. - Add the following line after API_SERVER_URL="https://xxxxxxxx/api" to include the custom repository name:
Bash - Save the file and use it in Databricks cluster creation.
- Click on Start or, if the cluster is running, click on Confirm and Restart.
- Open the
-
Set an Environment Variable at the Databricks Cluster Level:
- Log in to the Databricks workspace.
- Navigate to the cluster configuration.
- Click on Edit -> Advanced options.
- Click on the Spark tab and add the following property in Environment variables:
Bash - Save and click on Start or, if the cluster is running, click on Restart.
-
Set an Environment Variable in the Databricks Cluster Policy:
- Create or update an existing Databricks cluster policy using the following json block:
- Create or update a cluster with the above policy to set the environment variable on the cluster.
- Set the Spark configuration as done in step 2.
- Save and click on Start or, if the cluster is running, click on Confirm and Restart.
!!! note "When the custom service repo is not defined using any of these methods, the plugin will by default use the service repos starting with “privacera".
Validation/Verification¶
To confirm the successful association of the custom S3 service repo, perform the following steps. The steps are similar for other services like Hive, Files, Adls, etc.:
- Prerequisites:
- A running Databricks cluster secured using the above steps.
- Steps to Validate:
- Login to Databricks.
- Create or open an existing notebook. Associate the Notebook with the running Databricks cluster.
- Use the following PySpark commands to verify read access to an S3 CSV file.
- On the Privacera portal, go to Access Management -> Audits
- Check for the Service Name that you mentioned when Creating a Service repo, e.g.,
dev_s3
. - Check for the success or failure of the resource policy. A successful access is indicated as Allowed and a failure is indicated as Denied.
Fallback to Default Service-Def¶
After using the custom service, you might need to revert to the default service definition. Follow these steps:
-
Manually update the ranger_enable.sh (init script):
- Open the
ranger_enable.sh
script. - Update the property with the default prefix:
Bash - Save the file and use it in Databricks cluster creation.
- Click on Start or, if the cluster is running, click on Confirm and Restart.
- Open the
-
Update the vars.databricks.plugin.yml file:
- SSH to the instance where Privacera Manager is installed.
- Run the following command to navigate to the /custom-vars directory:
Bash - Open the
vars.databricks.plugin.yml
file:Bash - Comment out the DATABRICKS_SERVICE_NAME_PREFIX property:
Bash - Once the property is configured, run the following commands to generate and upload the configuration:
- Use the updated
ranger_enable.sh
script in Databricks cluster creation. - Click on Start or, if the cluster is running, click on Confirm and Restart.
-
Update the privacera_databricks.sh (init script):
- Open the
privacera_databricks.sh
script. - Remove the below property:
Bash - Save the file and use it in Databricks cluster creation.
- Click on Start or, if the cluster is running, click on Confirm and Restart.
- Open the
-
Remove an Environment Variable at the Databricks Cluster Level:
- Login to Databricks workspace.
- Navigate to the cluster configuration.
- Click on Edit -> Advanced options.
- Click on the Spark tab and remove the following property in Environment variables:
Bash - Save and click on Start or, if the cluster is running, click on Restart.
-
Remove an Environment Variable at the Databricks Cluster Policy:
- Update an existing Databricks cluster policy by removing the below JSON block:
- Create or update a cluster with the above policy.
- Remove the environment variable at the Databricks cluster level as done in step 2.
- Save and click on Start or, if the cluster is running, click on Confirm and Restart.
Use Service Principal id for Authorization¶
By Default Privacera use display name for Service Principal, if you want to use Service Principal Id then perform following steps:
- Login to Databricks workspace.
- In the left-hand sidebar, click on Compute.
- Choose the cluster where you want to configure the Service Principal Id.
- Click on Edit -> Advanced options.
- Click on the Spark tab.
- Add below property in Spark config
Bash - Click on Confirm.
- Click on Start, or if the cluster is running, click on Restart.
Whitelist py4j Security Manager via S3 or DBFS¶
To uphold security measures, certain Python methods are blacklisted by Databricks. However, Privacera makes employs these methods. If you wish to access these classes or methods, you may add them to a whitelisting file.
-
Create the
whitelisting.txt
File:-
This file should contain a list of packages, class constructors, or methods that you intend to whitelist.
-
Example:
Python
-
-
Upload the
whitelisting.txt
File:-
To DBFS, run the following command:
Text Only -
To S3, use the S3 console to upload the file to the desired location.
-
-
Update Databricks Spark Configuration:
-
In Databricks, navigate to the Spark Configuration and specify the location of the whitelisting file:
-
For DBFS:
Text Only -
For S3:
Text Only
-
-
Restart Your Cluster:
- After making these changes, please restart your Databricks cluster for the new whitelist to take effect.
Whitelisting alters Databricks' default security. Ensure this is aligned with your security policies.
Managing Init Scripts manually¶
If the flag DATABRICKS_INIT_SCRIPT_WORKSPACE_FLAG_ENABLE is set to false, then you need to manually upload the init script to your Databricks Workspace.
- Copy the initialization script from the Privacera SM host location ~/privacera/privacera-manager/output/databricks/ranger_enable.sh to your local machine.
- Log in to your Databricks account and click on the "Workspace" tab located on the left-hand side of the screen.
- Click on the "Workspace" folder. Here, create a new folder named "privacera."
- Within the "privacera" folder, create another folder named DEPLOYMENT_ENV_NAME, replacing DEPLOYMENT_ENV_NAME with the corresponding value specified in vars.privacera.yml.
- Enter the DEPLOYMENT_ENV_NAME folder. Right-click on the screen and select "Import." Choose to upload the initialization script ranger_enable.sh.
- Once the script is uploaded, it will appear in the DEPLOYMENT_ENV_NAME folder under the "privacera" folder.
Setting Up Multiple Databricks Workspaces¶
To set up multiple Databricks Workspaces, perform the following steps:
- SSH to the instance where Privacera Manager is installed.
- Open the
vars.databricks.plugin.yml
file: - Add or update the following properties in the file, ensuring that the
databricks_host_url
andtoken
values are updated accordingly for each workspace: - Once the properties are configured, run the following commands to generate and upload the configuration:
- Use the updated
ranger_enable.sh
script in Databricks cluster creation. - Click on Start or, if the cluster is running, click on Confirm and Restart.
Privacera-related FGAC Spark properties¶
Feature | Description | Default Value | Possible Values |
---|---|---|---|
spark.hadoop.privacera.custom.current_user.udf.names | Map logged-in user to Ranger user for row-filter policy. Valid function name however you have to make sure it should be in sync with row-filter current_user condition. | current_user() | |
spark.hadoop.privacera.spark.view.levelmaskingrowfilter.extension.enable | To enable View Level Access Control (Using Data_admin feature), View Level Column Masking, and View Level Row Filtering. | false | true/false |
spark.hadoop.privacera.spark.rowfilter.extension.enable | To enable/disable Row Filtering on table. | true | true/false |
spark.hadoop.privacera.spark.masking.extension.enable | To enable/disable Column Masking on table. | true | true/false |
privacera.fgac.file.ignore.path | Comma separated list of paths that are ignored during access check only for the file:/ protocol. | /tmp/tmp/* | /tmp/tmp*/tmp, /tmp/data1 |
- Prev topic: Setup
- Next topic: Troubleshooting