Skip to content

Configuring Databricks Cluster FGAC with JWT

JWT Auth Configuration

In Databricks by default Privacera uses the user from Databricks session for authorization. In addition, Privacera can also alternate authentication mechanism like JWT (JSON Web Token), which will use the user/group from the JWT payload instead of the user from Databricks session. This is useful when submitting jobs as service users where the temporary JWT token can be used to impersonate the ETL user/group.

Databricks now supports Service Principals which might be alternative to JWT token authentication. For more information, refer to Service Principals.

Pre Read

You should read how Privacera uses JWT for authentication before proceeding with this topic.

Prerequisites:

  1. JWT provider should be configured in Privacera Manager. Refer to the Configuring JWT Providers for more information.
  2. Make sure the access policies for the users/groups in the JWT token are used in the Ranger policies.

Configuration

In Self Managed and Data Plane, Databricks is automatically configured during the post-install step.

Set the below common properties in the Spark configuration of the Databricks cluster:

Static public key JWT

  1. Copy JWT Public Keys to Local Cluster File Path

    • Upload the JWT Public Key:
      • First, upload the jwttoken.pub file containing the JWT public key to the DBFS or workspace location.
      • For example, upload the key to /dbfs/user/jwt/keys.
    • Update the Init Script:
      • To copy the public keys to the local cluster file path, update the init script with the following commands:
        Bash
        1
        2
        3
        4
        export JWT_TOKEN_PUBLIC_KEY_DBFS_PATH="/dbfs/user/jwt/keys/."
        export JWT_TOKEN_PUBLIC_KEY_LOCAL_PATH="/tmp"
        
        cp -r ${JWT_TOKEN_PUBLIC_KEY_DBFS_PATH} ${JWT_TOKEN_PUBLIC_KEY_LOCAL_PATH}
        
      • This script sets the paths for the public keys in DBFS and the local cluster, then copies the keys from DBFS to the local path.
  2. Configure single static public key

    • Add below properties in the Spark configuration of the Databricks cluster along with the common properties:
      Bash
      1
      2
      3
      4
      5
      spark.hadoop.privacera.jwt.0.token.parserType PING_IDENTITY
      spark.hadoop.privacera.jwt.0.token.userKey client_id
      spark.hadoop.privacera.jwt.0.token.groupKey scope
      spark.hadoop.privacera.jwt.0.token.issuer https://example.com/issuer
      spark.hadoop.privacera.jwt.0.token.publickey /tmp/jwttoken0.pub
      
    • Save the changes and click on Start or, if the cluster is running, click on Confirm and Restart.
  3. Configure multiple static public keys

    • Add below properties in Spark configuration of Databricks cluster along with the common properties:
      Bash
      spark.hadoop.privacera.jwt.0.token.parserType PING_IDENTITY
      spark.hadoop.privacera.jwt.0.token.userKey client_id
      spark.hadoop.privacera.jwt.0.token.groupKey scope
      spark.hadoop.privacera.jwt.0.token.issuer https://example.com/issuer
      spark.hadoop.privacera.jwt.0.token.publickey /tmp/jwttoken.pub
      
      spark.hadoop.privacera.jwt.1.token.parserType PING_IDENTITY
      spark.hadoop.privacera.jwt.1.token.userKey client_id
      spark.hadoop.privacera.jwt.1.token.groupKey scope
      spark.hadoop.privacera.jwt.1.token.issuer https://example.com/issuer
      spark.hadoop.privacera.jwt.1.token.publickey /tmp/jwttoken1.pub
      
      spark.hadoop.privacera.jwt.2.token.parserType KEYCLOAK
      spark.hadoop.privacera.jwt.2.token.userKey client_id
      spark.hadoop.privacera.jwt.2.token.groupKey scope
      spark.hadoop.privacera.jwt.2.token.issuer https://example.com/issuer
      spark.hadoop.privacera.jwt.2.token.publickey /tmp/jwttoken2.pub
      
    • Save the changes and click on Start or, if the cluster is running, click on Confirm and Restart.

Dynamic public key JWT:

  1. Configure single dynamic public key

    • Add below properties in Spark configuration of Databricks cluster along with common properties:
      Bash
      spark.hadoop.privacera.jwt.0.token.parserType PING_IDENTITY
      spark.hadoop.privacera.jwt.0.token.userKey client_id
      spark.hadoop.privacera.jwt.0.token.groupKey scope
      spark.hadoop.privacera.jwt.0.token.issuer https://example.com/issuer
      spark.hadoop.privacera.jwt.0.token.publickey.provider.url https://<JWKS-provider>/get_public_key?kid=
      spark.hadoop.privacera.jwt.0.token.publickey.provider.auth.type basic
      spark.hadoop.privacera.jwt.0.token.publickey.provider.auth.username <username>
      spark.hadoop.privacera.jwt.0.token.publickey.provider.auth.password <password>
      spark.hadoop.privacera.jwt.0.token.publickey.provider.response.key x5c
      spark.hadoop.privacera.jwt.0.token.publickey.provider.key.id kid
      
    • Save the changes and click on Start or, if the cluster is running, click on Confirm and Restart.
  2. Configure multiple dynamic public keys

    • Add below properties in the Spark configuration of Databricks cluster along with the common properties:
      Bash
      spark.hadoop.privacera.jwt.0.token.parserType PING_IDENTITY
      spark.hadoop.privacera.jwt.0.token.userKey client_id
      spark.hadoop.privacera.jwt.0.token.groupKey scope
      spark.hadoop.privacera.jwt.0.token.issuer https://example.com/issuer
      spark.hadoop.privacera.jwt.0.token.publickey.provider.url https://<JWKS-provider>/get_public_key?kid=
      spark.hadoop.privacera.jwt.0.token.publickey.provider.auth.type basic
      spark.hadoop.privacera.jwt.0.token.publickey.provider.auth.username <username>
      spark.hadoop.privacera.jwt.0.token.publickey.provider.auth.password <password>
      spark.hadoop.privacera.jwt.0.token.publickey.provider.response.key x5c
      spark.hadoop.privacera.jwt.0.token.publickey.provider.key.id kid
      
      spark.hadoop.privacera.jwt.1.token.parserType PING_IDENTITY
      spark.hadoop.privacera.jwt.1.token.userKey client_id
      spark.hadoop.privacera.jwt.1.token.groupKey scope
      spark.hadoop.privacera.jwt.1.token.issuer https://example.com/issuer
      spark.hadoop.privacera.jwt.1.token.publickey.provider.url https://<JWKS-provider>/get_public_key?kid=
      spark.hadoop.privacera.jwt.1.token.publickey.provider.auth.type basic
      spark.hadoop.privacera.jwt.1.token.publickey.provider.auth.username <username>
      spark.hadoop.privacera.jwt.1.token.publickey.provider.auth.password <password>
      spark.hadoop.privacera.jwt.1.token.publickey.provider.response.key x5c
      spark.hadoop.privacera.jwt.1.token.publickey.provider.key.id kid
      
    • Save the changes and click on Start or, if the cluster is running, click on Confirm and Restart.

Static and Dynamic public keys JWT:

  1. Configure static and dynamic public keys
    • Add below properties in the Spark configuration of Databricks cluster along with the common properties:
      Bash
      spark.hadoop.privacera.jwt.0.token.parserType PING_IDENTITY
      spark.hadoop.privacera.jwt.0.token.userKey client_id
      spark.hadoop.privacera.jwt.0.token.groupKey scope
      spark.hadoop.privacera.jwt.0.token.issuer https://example.com/issuer
      spark.hadoop.privacera.jwt.0.token.publickey /tmp/jwttoken0.pub
      
      spark.hadoop.privacera.jwt.1.token.parserType PING_IDENTITY
      spark.hadoop.privacera.jwt.1.token.userKey client_id
      spark.hadoop.privacera.jwt.1.token.groupKey scope
      spark.hadoop.privacera.jwt.1.token.issuer https://example.com/issuer
      spark.hadoop.privacera.jwt.1.token.publickey.provider.url https://<JWKS-provider>/get_public_key?kid=
      spark.hadoop.privacera.jwt.1.token.publickey.provider.auth.type basic
      spark.hadoop.privacera.jwt.1.token.publickey.provider.auth.username <username>
      spark.hadoop.privacera.jwt.1.token.publickey.provider.auth.password <password>
      spark.hadoop.privacera.jwt.1.token.publickey.provider.response.key x5c
      spark.hadoop.privacera.jwt.1.token.publickey.provider.key.id kid
      
      spark.hadoop.privacera.jwt.2.token.parserType PING_IDENTITY
      spark.hadoop.privacera.jwt.2.token.userKey client_id
      spark.hadoop.privacera.jwt.2.token.groupKey scope
      spark.hadoop.privacera.jwt.2.token.issuer https://example.com/issuer
      spark.hadoop.privacera.jwt.2.token.publickey /tmp/jwttoken1.pub
      
    • Save the changes and click on Start or, if the cluster is running, click on Confirm and Restart.

Validation

  1. Prerequisites:
  2. A running Databricks cluster secured with the above steps.
  3. Steps to Validate:
  4. Login to Databricks.
  5. Create or open an existing notebook. Associate the Notebook with the running Databricks cluster.
  6. To use JWT in Privacera Databricks integration, you need to copy the JWT token file or string to the cluster's local file. To do this, use the following commands and replace <jwt_token> with your actual jwt token value.
    Python
    1
    2
    3
    4
    5
    6
    7
    8
    9
    jwt_file_path="/tmp/jwttoken.dat"
    token="<jwt_token>"
    file1 = open(jwt_file_path,"w")
    file1.write(token)
    file1.close()
    
    # Check the file content
    f = open(jwt_file_path,"r")
    print(f.read())
    
    • Use the following PySpark commands to verify S3 CSV file read access.
      Python
      1
      2
      3
      4
      5
      6
      7
      8
      # Define the S3 path to your file
      s3_path = "s3a://your-bucket-name/path/to/your/file"
      
      # Read the CSV file from the specified S3 path
      df = spark.read.format("csv").option("header", "true").load(s3_path)
      
      # Display the first 5 rows of the dataframe
      df.show(5)
      
    • On the Privacera portal, go to Access Management -> Audits
    • Check for the User that you mentioned in the Payload while Creating the JWT Token, e.g., jwt_user.
    • Check for the success or failure of the resource policy. A successful access is indicated as Allowed and a failure is indicated as Denied.

Comments