Configuring Databricks Cluster FGAC with JWT JWT Auth Configuration In Databricks by default Privacera uses the user from Databricks session for authorization. In addition, Privacera can also alternate authentication mechanism like JWT (JSON Web Token) , which will use the user/group from the JWT payload instead of the user from Databricks session. This is useful when submitting jobs as service users where the temporary JWT token can be used to impersonate the ETL user/group.
Databricks now supports Service Principals which might be alternative to JWT token authentication. For more information, refer to Service Principals .
Prerequisites: JWT provider should be configured in Privacera Manager. Refer to the Configuring JWT Providers for more information. Make sure the access policies for the users/groups in the JWT token are used in the Ranger policies. Configuration Self Managed and Data Plane PrivaceraCloud
In Self Managed and Data Plane, Databricks is automatically configured during the post-install step.
Set the below common properties in the Spark configuration of the Databricks cluster:
Static public key JWT Copy JWT Public Keys to Local Cluster File Path
Upload the JWT Public Key: First, upload the jwttoken.pub
file containing the JWT public key to the DBFS or workspace location. For example, upload the key to /dbfs/user/jwt/keys
. Update the Init Script: To copy the public keys to the local cluster file path, update the init script with the following commands: Bash export JWT_TOKEN_PUBLIC_KEY_DBFS_PATH = "/dbfs/user/jwt/keys/."
export JWT_TOKEN_PUBLIC_KEY_LOCAL_PATH = "/tmp"
cp -r ${ JWT_TOKEN_PUBLIC_KEY_DBFS_PATH } ${ JWT_TOKEN_PUBLIC_KEY_LOCAL_PATH }
This script sets the paths for the public keys in DBFS and the local cluster, then copies the keys from DBFS to the local path. Configure single static public key
Add below properties in the Spark configuration of the Databricks cluster along with the common properties: Bash spark.hadoop.privacera.jwt.0.token.parserType PING_IDENTITY
spark.hadoop.privacera.jwt.0.token.userKey client_id
spark.hadoop.privacera.jwt.0.token.groupKey scope
spark.hadoop.privacera.jwt.0.token.issuer https://example.com/issuer
spark.hadoop.privacera.jwt.0.token.publickey /tmp/jwttoken0.pub
Save the changes and click on Start or, if the cluster is running, click on Confirm and Restart. Configure multiple static public keys
Add below properties in Spark configuration of Databricks cluster along with the common properties: Bash spark.hadoop.privacera.jwt.0.token.parserType PING_IDENTITY
spark.hadoop.privacera.jwt.0.token.userKey client_id
spark.hadoop.privacera.jwt.0.token.groupKey scope
spark.hadoop.privacera.jwt.0.token.issuer https://example.com/issuer
spark.hadoop.privacera.jwt.0.token.publickey /tmp/jwttoken.pub
spark.hadoop.privacera.jwt.1.token.parserType PING_IDENTITY
spark.hadoop.privacera.jwt.1.token.userKey client_id
spark.hadoop.privacera.jwt.1.token.groupKey scope
spark.hadoop.privacera.jwt.1.token.issuer https://example.com/issuer
spark.hadoop.privacera.jwt.1.token.publickey /tmp/jwttoken1.pub
spark.hadoop.privacera.jwt.2.token.parserType KEYCLOAK
spark.hadoop.privacera.jwt.2.token.userKey client_id
spark.hadoop.privacera.jwt.2.token.groupKey scope
spark.hadoop.privacera.jwt.2.token.issuer https://example.com/issuer
spark.hadoop.privacera.jwt.2.token.publickey /tmp/jwttoken2.pub
Save the changes and click on Start or, if the cluster is running, click on Confirm and Restart. Dynamic public key JWT: Configure single dynamic public key
Add below properties in Spark configuration of Databricks cluster along with common properties: Bash spark.hadoop.privacera.jwt.0.token.parserType PING_IDENTITY
spark.hadoop.privacera.jwt.0.token.userKey client_id
spark.hadoop.privacera.jwt.0.token.groupKey scope
spark.hadoop.privacera.jwt.0.token.issuer https://example.com/issuer
spark.hadoop.privacera.jwt.0.token.publickey.provider.url https://<JWKS-provider>/get_public_key?kid=
spark.hadoop.privacera.jwt.0.token.publickey.provider.auth.type basic
spark.hadoop.privacera.jwt.0.token.publickey.provider.auth.username <username>
spark.hadoop.privacera.jwt.0.token.publickey.provider.auth.password <password>
spark.hadoop.privacera.jwt.0.token.publickey.provider.response.key x5c
spark.hadoop.privacera.jwt.0.token.publickey.provider.key.id kid
Save the changes and click on Start or, if the cluster is running, click on Confirm and Restart. Configure multiple dynamic public keys
Add below properties in the Spark configuration of Databricks cluster along with the common properties: Bash spark.hadoop.privacera.jwt.0.token.parserType PING_IDENTITY
spark.hadoop.privacera.jwt.0.token.userKey client_id
spark.hadoop.privacera.jwt.0.token.groupKey scope
spark.hadoop.privacera.jwt.0.token.issuer https://example.com/issuer
spark.hadoop.privacera.jwt.0.token.publickey.provider.url https://<JWKS-provider>/get_public_key?kid=
spark.hadoop.privacera.jwt.0.token.publickey.provider.auth.type basic
spark.hadoop.privacera.jwt.0.token.publickey.provider.auth.username <username>
spark.hadoop.privacera.jwt.0.token.publickey.provider.auth.password <password>
spark.hadoop.privacera.jwt.0.token.publickey.provider.response.key x5c
spark.hadoop.privacera.jwt.0.token.publickey.provider.key.id kid
spark.hadoop.privacera.jwt.1.token.parserType PING_IDENTITY
spark.hadoop.privacera.jwt.1.token.userKey client_id
spark.hadoop.privacera.jwt.1.token.groupKey scope
spark.hadoop.privacera.jwt.1.token.issuer https://example.com/issuer
spark.hadoop.privacera.jwt.1.token.publickey.provider.url https://<JWKS-provider>/get_public_key?kid=
spark.hadoop.privacera.jwt.1.token.publickey.provider.auth.type basic
spark.hadoop.privacera.jwt.1.token.publickey.provider.auth.username <username>
spark.hadoop.privacera.jwt.1.token.publickey.provider.auth.password <password>
spark.hadoop.privacera.jwt.1.token.publickey.provider.response.key x5c
spark.hadoop.privacera.jwt.1.token.publickey.provider.key.id kid
Save the changes and click on Start or, if the cluster is running, click on Confirm and Restart. Static and Dynamic public keys JWT: Configure static and dynamic public keys Add below properties in the Spark configuration of Databricks cluster along with the common properties: Bash spark.hadoop.privacera.jwt.0.token.parserType PING_IDENTITY
spark.hadoop.privacera.jwt.0.token.userKey client_id
spark.hadoop.privacera.jwt.0.token.groupKey scope
spark.hadoop.privacera.jwt.0.token.issuer https://example.com/issuer
spark.hadoop.privacera.jwt.0.token.publickey /tmp/jwttoken0.pub
spark.hadoop.privacera.jwt.1.token.parserType PING_IDENTITY
spark.hadoop.privacera.jwt.1.token.userKey client_id
spark.hadoop.privacera.jwt.1.token.groupKey scope
spark.hadoop.privacera.jwt.1.token.issuer https://example.com/issuer
spark.hadoop.privacera.jwt.1.token.publickey.provider.url https://<JWKS-provider>/get_public_key?kid=
spark.hadoop.privacera.jwt.1.token.publickey.provider.auth.type basic
spark.hadoop.privacera.jwt.1.token.publickey.provider.auth.username <username>
spark.hadoop.privacera.jwt.1.token.publickey.provider.auth.password <password>
spark.hadoop.privacera.jwt.1.token.publickey.provider.response.key x5c
spark.hadoop.privacera.jwt.1.token.publickey.provider.key.id kid
spark.hadoop.privacera.jwt.2.token.parserType PING_IDENTITY
spark.hadoop.privacera.jwt.2.token.userKey client_id
spark.hadoop.privacera.jwt.2.token.groupKey scope
spark.hadoop.privacera.jwt.2.token.issuer https://example.com/issuer
spark.hadoop.privacera.jwt.2.token.publickey /tmp/jwttoken1.pub
Save the changes and click on Start or, if the cluster is running, click on Confirm and Restart. Validation Prerequisites: A running Databricks cluster secured with the above steps. Steps to Validate: Login to Databricks. Create or open an existing notebook. Associate the Notebook with the running Databricks cluster. To use JWT in Privacera Databricks integration, you need to copy the JWT token file or string to the cluster's local file. To do this, use the following commands and replace <jwt_token>
with your actual jwt token value. Python jwt_file_path = "/tmp/jwttoken.dat"
token = "<jwt_token>"
file1 = open ( jwt_file_path , "w" )
file1 . write ( token )
file1 . close ()
# Check the file content
f = open ( jwt_file_path , "r" )
print ( f . read ())
Use the following PySpark commands to verify S3 CSV file read access. Python # Define the S3 path to your file
s3_path = "s3a://your-bucket-name/path/to/your/file"
# Read the CSV file from the specified S3 path
df = spark . read . format ( "csv" ) . option ( "header" , "true" ) . load ( s3_path )
# Display the first 5 rows of the dataframe
df . show ( 5 )
On the Privacera portal, go to Access Management -> Audits Check for the User that you mentioned in the Payload while Creating the JWT Token, e.g., jwt_user
. Check for the success or failure of the resource policy. A successful access is indicated as Allowed and a failure is indicated as Denied .