Configuring Databricks Cluster FGAC with JWT¶

JWT Auth Configuration¶

In Databricks by default Privacera uses the user from Databricks session for authorization. In addition, Privacera can also alternate authentication mechanism like JWT (JSON Web Token), which will use the user/group from the JWT payload instead of the user from Databricks session. This is useful when submitting jobs as service users where the temporary JWT token can be used to impersonate the ETL user/group.

Databricks now supports Service Principals which might be alternative to JWT token authentication. For more information, refer to Service Principals.

Pre Read

You should read how Privacera uses JWT for authentication before proceeding with this topic.

Prerequisites:¶

JWT provider should be configured in Privacera Manager. Refer to the Configuring JWT Providers for more information.
Make sure the access policies for the users/groups in the JWT token are used in the Ranger policies.

Configuration¶

Self Managed and Data PlanePrivaceraCloud

Go to the Privacera Manager directory and edit the Databricks configuration file to enable JWT authentication.

Bash
cd ~/privacera/privacera-manager
vi config/custom-vars/vars.databricks.plugin.yml

2. Add following property to enable JWT for Databricks:

Bash
1	`DATABRICKS_JWT_OAUTH_ENABLE: "true"`

3. After all the changes are done you need to update the helm chart, apply the changes and also run the post install steps

Step 1 - Setup which generates the helm charts. This step usually takes few minutes.

Bash
cd ~/privacera/privacera-manager
./privacera-manager.sh setup

Step 2 - Apply the Privacera Manager helm charts.

Bash
cd ~/privacera/privacera-manager
./pm_with_helm.sh upgrade

Step 3 - Post-installation step which generates Plugin tar ball, updates Route 53 DNS and so on.

Bash
cd ~/privacera/privacera-manager
./privacera-manager.sh post-install

Set the below common properties in the Spark configuration of the Databricks cluster:

Static public key JWT¶

Copy JWT Public Keys to Local Cluster File Path
- Upload the JWT Public Key:
  - First, upload the jwttoken.pub file containing the JWT public key to the DBFS or workspace location.
  - For example, upload the key to /dbfs/user/jwt/keys.
- Update the Init Script:
  - To copy the public keys to the local cluster file path, update the init script with the following commands:
    Bash
    1 2 3 4
    export JWT_TOKEN_PUBLIC_KEY_DBFS_PATH="/dbfs/user/jwt/keys/." export JWT_TOKEN_PUBLIC_KEY_LOCAL_PATH="/tmp" cp -r ${JWT_TOKEN_PUBLIC_KEY_DBFS_PATH} ${JWT_TOKEN_PUBLIC_KEY_LOCAL_PATH}
  - This script sets the paths for the public keys in DBFS and the local cluster, then copies the keys from DBFS to the local path.

Configure single static public key

Add below properties in the Spark configuration of the Databricks cluster along with the common properties:

Bash
spark.hadoop.privacera.jwt.0.token.parserType PING_IDENTITY
spark.hadoop.privacera.jwt.0.token.userKey client_id
spark.hadoop.privacera.jwt.0.token.groupKey scope
spark.hadoop.privacera.jwt.0.token.issuer https://example.com/issuer
spark.hadoop.privacera.jwt.0.token.publickey /tmp/jwttoken0.pub

Save the changes and click on Start or, if the cluster is running, click on Confirm and Restart.

Configure multiple static public keys

Add below properties in Spark configuration of Databricks cluster along with the common properties:

Bash
spark.hadoop.privacera.jwt.0.token.parserType PING_IDENTITY
spark.hadoop.privacera.jwt.0.token.userKey client_id
spark.hadoop.privacera.jwt.0.token.groupKey scope
spark.hadoop.privacera.jwt.0.token.issuer https://example.com/issuer
spark.hadoop.privacera.jwt.0.token.publickey /tmp/jwttoken.pub

spark.hadoop.privacera.jwt.1.token.parserType PING_IDENTITY
spark.hadoop.privacera.jwt.1.token.userKey client_id
spark.hadoop.privacera.jwt.1.token.groupKey scope
spark.hadoop.privacera.jwt.1.token.issuer https://example.com/issuer
spark.hadoop.privacera.jwt.1.token.publickey /tmp/jwttoken1.pub

spark.hadoop.privacera.jwt.2.token.parserType KEYCLOAK
spark.hadoop.privacera.jwt.2.token.userKey client_id
spark.hadoop.privacera.jwt.2.token.groupKey scope
spark.hadoop.privacera.jwt.2.token.issuer https://example.com/issuer
spark.hadoop.privacera.jwt.2.token.publickey /tmp/jwttoken2.pub

Save the changes and click on Start or, if the cluster is running, click on Confirm and Restart.

Dynamic public key JWT:¶

Configure single dynamic public key

Add below properties in Spark configuration of Databricks cluster along with common properties:

Bash
spark.hadoop.privacera.jwt.0.token.parserType PING_IDENTITY
spark.hadoop.privacera.jwt.0.token.userKey client_id
spark.hadoop.privacera.jwt.0.token.groupKey scope
spark.hadoop.privacera.jwt.0.token.issuer https://example.com/issuer
spark.hadoop.privacera.jwt.0.token.publickey.provider.url https://<JWKS-provider>/get_public_key?kid=
spark.hadoop.privacera.jwt.0.token.publickey.provider.auth.type basic
spark.hadoop.privacera.jwt.0.token.publickey.provider.auth.username <username>
spark.hadoop.privacera.jwt.0.token.publickey.provider.auth.password <password>
spark.hadoop.privacera.jwt.0.token.publickey.provider.response.key x5c
spark.hadoop.privacera.jwt.0.token.publickey.provider.key.id kid

Save the changes and click on Start or, if the cluster is running, click on Confirm and Restart.

Configure multiple dynamic public keys

Add below properties in the Spark configuration of Databricks cluster along with the common properties:

Bash
spark.hadoop.privacera.jwt.0.token.parserType PING_IDENTITY
spark.hadoop.privacera.jwt.0.token.userKey client_id
spark.hadoop.privacera.jwt.0.token.groupKey scope
spark.hadoop.privacera.jwt.0.token.issuer https://example.com/issuer
spark.hadoop.privacera.jwt.0.token.publickey.provider.url https://<JWKS-provider>/get_public_key?kid=
spark.hadoop.privacera.jwt.0.token.publickey.provider.auth.type basic
spark.hadoop.privacera.jwt.0.token.publickey.provider.auth.username <username>
spark.hadoop.privacera.jwt.0.token.publickey.provider.auth.password <password>
spark.hadoop.privacera.jwt.0.token.publickey.provider.response.key x5c
spark.hadoop.privacera.jwt.0.token.publickey.provider.key.id kid

spark.hadoop.privacera.jwt.1.token.parserType PING_IDENTITY
spark.hadoop.privacera.jwt.1.token.userKey client_id
spark.hadoop.privacera.jwt.1.token.groupKey scope
spark.hadoop.privacera.jwt.1.token.issuer https://example.com/issuer
spark.hadoop.privacera.jwt.1.token.publickey.provider.url https://<JWKS-provider>/get_public_key?kid=
spark.hadoop.privacera.jwt.1.token.publickey.provider.auth.type basic
spark.hadoop.privacera.jwt.1.token.publickey.provider.auth.username <username>
spark.hadoop.privacera.jwt.1.token.publickey.provider.auth.password <password>
spark.hadoop.privacera.jwt.1.token.publickey.provider.response.key x5c
spark.hadoop.privacera.jwt.1.token.publickey.provider.key.id kid

Save the changes and click on Start or, if the cluster is running, click on Confirm and Restart.

Static and Dynamic public keys JWT:¶

Configure static and dynamic public keys

Add below properties in the Spark configuration of Databricks cluster along with the common properties:

Bash
spark.hadoop.privacera.jwt.0.token.parserType PING_IDENTITY
spark.hadoop.privacera.jwt.0.token.userKey client_id
spark.hadoop.privacera.jwt.0.token.groupKey scope
spark.hadoop.privacera.jwt.0.token.issuer https://example.com/issuer
spark.hadoop.privacera.jwt.0.token.publickey /tmp/jwttoken0.pub

spark.hadoop.privacera.jwt.1.token.parserType PING_IDENTITY
spark.hadoop.privacera.jwt.1.token.userKey client_id
spark.hadoop.privacera.jwt.1.token.groupKey scope
spark.hadoop.privacera.jwt.1.token.issuer https://example.com/issuer
spark.hadoop.privacera.jwt.1.token.publickey.provider.url https://<JWKS-provider>/get_public_key?kid=
spark.hadoop.privacera.jwt.1.token.publickey.provider.auth.type basic
spark.hadoop.privacera.jwt.1.token.publickey.provider.auth.username <username>
spark.hadoop.privacera.jwt.1.token.publickey.provider.auth.password <password>
spark.hadoop.privacera.jwt.1.token.publickey.provider.response.key x5c
spark.hadoop.privacera.jwt.1.token.publickey.provider.key.id kid

spark.hadoop.privacera.jwt.2.token.parserType PING_IDENTITY
spark.hadoop.privacera.jwt.2.token.userKey client_id
spark.hadoop.privacera.jwt.2.token.groupKey scope
spark.hadoop.privacera.jwt.2.token.issuer https://example.com/issuer
spark.hadoop.privacera.jwt.2.token.publickey /tmp/jwttoken1.pub

Save the changes and click on Start or, if the cluster is running, click on Confirm and Restart.

Validation¶

Prerequisites:
A running Databricks cluster secured with the above steps.
Steps to Validate:
Login to Databricks.
Create or open an existing notebook. Associate the Notebook with the running Databricks cluster.

To use JWT in Privacera Databricks integration, you need to copy the JWT token file or string to the cluster's local file. To do this, use the following commands and replace <jwt_token> with your actual jwt token value.

Python
jwt_file_path="/tmp/jwttoken.dat"
token="<jwt_token>"
file1 = open(jwt_file_path,"w")
file1.write(token)
file1.close()

# Check the file content
f = open(jwt_file_path,"r")
print(f.read())

Use the following PySpark commands to verify S3 CSV file read access.

Python
# Define the S3 path to your file
s3_path = "s3a://your-bucket-name/path/to/your/file"

# Read the CSV file from the specified S3 path
df = spark.read.format("csv").option("header", "true").load(s3_path)

# Display the first 5 rows of the dataframe
df.show(5)

On the Privacera portal, go to Access Management -> Audits
Check for the User that you mentioned in the Payload while Creating the JWT Token, e.g., jwt_user.
Check for the success or failure of the resource policy. A successful access is indicated as Allowed and a failure is indicated as Denied.

Prev topic: Advanced Configuration