Skip to content

Configuring Databricks Cluster FGAC with JWT

JWT Auth Configuration

In Databricks by default Privacera uses the user from Databricks session for authorization. In addition, Privacera can also alternate authentication mechanism like JWT (JSON Web Token), which will use the user/group from the JWT payload instead of the user from Databricks session. This is useful when submitting jobs as service users where the temporary JWT token can be used to impersonate the ETL user/group.

Databricks now supports Service Principals which might be alternative to JWT token authentication. For more information, refer to Service Principals.

Pre Read

You should read how Privacera uses JWT for authentication before proceeding with this topic.

Prerequisites:

  1. JWT provider should be configured in Privacera Manager. Refer to the Configuring JWT Providers for more information.
  2. Make sure the access policies for the users/groups in the JWT token are used in the Ranger policies.

Configuration

  1. Go to the Privacera Manager directory and edit the Databricks configuration file to enable JWT authentication.

Bash
cd ~/privacera/privacera-manager
vi config/custom-vars/vars.databricks.plugin.yml
2. Add following property to enable JWT for Databricks:

Bash
DATABRICKS_JWT_OAUTH_ENABLE: "true"
3. After all the changes are done you need to update the helm chart, apply the changes and also run the post install steps

Step 1 - Setup which generates the helm charts. This step usually takes few minutes.

Bash
cd ~/privacera/privacera-manager
./privacera-manager.sh setup
Step 2 - Apply the Privacera Manager helm charts.
Bash
cd ~/privacera/privacera-manager
./pm_with_helm.sh upgrade
Step 3 - Post-installation step which generates Plugin tar ball, updates Route 53 DNS and so on.

Bash
cd ~/privacera/privacera-manager
./privacera-manager.sh post-install

Set the below common properties in the Spark configuration of the Databricks cluster:

Static public key JWT

  1. Copy JWT Public Keys to Local Cluster File Path

    • Upload the JWT Public Key:
      • First, upload the jwttoken.pub file containing the JWT public key to the DBFS or workspace location.
      • For example, upload the key to /dbfs/user/jwt/keys.
    • Update the Init Script:
      • To copy the public keys to the local cluster file path, update the init script with the following commands:
        Bash
        1
        2
        3
        4
        export JWT_TOKEN_PUBLIC_KEY_DBFS_PATH="/dbfs/user/jwt/keys/."
        export JWT_TOKEN_PUBLIC_KEY_LOCAL_PATH="/tmp"
        
        cp -r ${JWT_TOKEN_PUBLIC_KEY_DBFS_PATH} ${JWT_TOKEN_PUBLIC_KEY_LOCAL_PATH}
        
      • This script sets the paths for the public keys in DBFS and the local cluster, then copies the keys from DBFS to the local path.
  2. Configure single static public key

    • Add below properties in the Spark configuration of the Databricks cluster along with the common properties:
      Bash
      1
      2
      3
      4
      5
      spark.hadoop.privacera.jwt.0.token.parserType PING_IDENTITY
      spark.hadoop.privacera.jwt.0.token.userKey client_id
      spark.hadoop.privacera.jwt.0.token.groupKey scope
      spark.hadoop.privacera.jwt.0.token.issuer https://example.com/issuer
      spark.hadoop.privacera.jwt.0.token.publickey /tmp/jwttoken0.pub
      
    • Save the changes and click on Start or, if the cluster is running, click on Confirm and Restart.
  3. Configure multiple static public keys

    • Add below properties in Spark configuration of Databricks cluster along with the common properties:
      Bash
      spark.hadoop.privacera.jwt.0.token.parserType PING_IDENTITY
      spark.hadoop.privacera.jwt.0.token.userKey client_id
      spark.hadoop.privacera.jwt.0.token.groupKey scope
      spark.hadoop.privacera.jwt.0.token.issuer https://example.com/issuer
      spark.hadoop.privacera.jwt.0.token.publickey /tmp/jwttoken.pub
      
      spark.hadoop.privacera.jwt.1.token.parserType PING_IDENTITY
      spark.hadoop.privacera.jwt.1.token.userKey client_id
      spark.hadoop.privacera.jwt.1.token.groupKey scope
      spark.hadoop.privacera.jwt.1.token.issuer https://example.com/issuer
      spark.hadoop.privacera.jwt.1.token.publickey /tmp/jwttoken1.pub
      
      spark.hadoop.privacera.jwt.2.token.parserType KEYCLOAK
      spark.hadoop.privacera.jwt.2.token.userKey client_id
      spark.hadoop.privacera.jwt.2.token.groupKey scope
      spark.hadoop.privacera.jwt.2.token.issuer https://example.com/issuer
      spark.hadoop.privacera.jwt.2.token.publickey /tmp/jwttoken2.pub
      
    • Save the changes and click on Start or, if the cluster is running, click on Confirm and Restart.

Dynamic public key JWT:

  1. Configure single dynamic public key

    • Add below properties in Spark configuration of Databricks cluster along with common properties:
      Bash
      spark.hadoop.privacera.jwt.0.token.parserType PING_IDENTITY
      spark.hadoop.privacera.jwt.0.token.userKey client_id
      spark.hadoop.privacera.jwt.0.token.groupKey scope
      spark.hadoop.privacera.jwt.0.token.issuer https://example.com/issuer
      spark.hadoop.privacera.jwt.0.token.publickey.provider.url https://<JWKS-provider>/get_public_key?kid=
      spark.hadoop.privacera.jwt.0.token.publickey.provider.auth.type basic
      spark.hadoop.privacera.jwt.0.token.publickey.provider.auth.username <username>
      spark.hadoop.privacera.jwt.0.token.publickey.provider.auth.password <password>
      spark.hadoop.privacera.jwt.0.token.publickey.provider.response.key x5c
      spark.hadoop.privacera.jwt.0.token.publickey.provider.key.id kid
      
    • Save the changes and click on Start or, if the cluster is running, click on Confirm and Restart.
  2. Configure multiple dynamic public keys

    • Add below properties in the Spark configuration of Databricks cluster along with the common properties:
      Bash
      spark.hadoop.privacera.jwt.0.token.parserType PING_IDENTITY
      spark.hadoop.privacera.jwt.0.token.userKey client_id
      spark.hadoop.privacera.jwt.0.token.groupKey scope
      spark.hadoop.privacera.jwt.0.token.issuer https://example.com/issuer
      spark.hadoop.privacera.jwt.0.token.publickey.provider.url https://<JWKS-provider>/get_public_key?kid=
      spark.hadoop.privacera.jwt.0.token.publickey.provider.auth.type basic
      spark.hadoop.privacera.jwt.0.token.publickey.provider.auth.username <username>
      spark.hadoop.privacera.jwt.0.token.publickey.provider.auth.password <password>
      spark.hadoop.privacera.jwt.0.token.publickey.provider.response.key x5c
      spark.hadoop.privacera.jwt.0.token.publickey.provider.key.id kid
      
      spark.hadoop.privacera.jwt.1.token.parserType PING_IDENTITY
      spark.hadoop.privacera.jwt.1.token.userKey client_id
      spark.hadoop.privacera.jwt.1.token.groupKey scope
      spark.hadoop.privacera.jwt.1.token.issuer https://example.com/issuer
      spark.hadoop.privacera.jwt.1.token.publickey.provider.url https://<JWKS-provider>/get_public_key?kid=
      spark.hadoop.privacera.jwt.1.token.publickey.provider.auth.type basic
      spark.hadoop.privacera.jwt.1.token.publickey.provider.auth.username <username>
      spark.hadoop.privacera.jwt.1.token.publickey.provider.auth.password <password>
      spark.hadoop.privacera.jwt.1.token.publickey.provider.response.key x5c
      spark.hadoop.privacera.jwt.1.token.publickey.provider.key.id kid
      
    • Save the changes and click on Start or, if the cluster is running, click on Confirm and Restart.

Static and Dynamic public keys JWT:

  1. Configure static and dynamic public keys
    • Add below properties in the Spark configuration of Databricks cluster along with the common properties:
      Bash
      spark.hadoop.privacera.jwt.0.token.parserType PING_IDENTITY
      spark.hadoop.privacera.jwt.0.token.userKey client_id
      spark.hadoop.privacera.jwt.0.token.groupKey scope
      spark.hadoop.privacera.jwt.0.token.issuer https://example.com/issuer
      spark.hadoop.privacera.jwt.0.token.publickey /tmp/jwttoken0.pub
      
      spark.hadoop.privacera.jwt.1.token.parserType PING_IDENTITY
      spark.hadoop.privacera.jwt.1.token.userKey client_id
      spark.hadoop.privacera.jwt.1.token.groupKey scope
      spark.hadoop.privacera.jwt.1.token.issuer https://example.com/issuer
      spark.hadoop.privacera.jwt.1.token.publickey.provider.url https://<JWKS-provider>/get_public_key?kid=
      spark.hadoop.privacera.jwt.1.token.publickey.provider.auth.type basic
      spark.hadoop.privacera.jwt.1.token.publickey.provider.auth.username <username>
      spark.hadoop.privacera.jwt.1.token.publickey.provider.auth.password <password>
      spark.hadoop.privacera.jwt.1.token.publickey.provider.response.key x5c
      spark.hadoop.privacera.jwt.1.token.publickey.provider.key.id kid
      
      spark.hadoop.privacera.jwt.2.token.parserType PING_IDENTITY
      spark.hadoop.privacera.jwt.2.token.userKey client_id
      spark.hadoop.privacera.jwt.2.token.groupKey scope
      spark.hadoop.privacera.jwt.2.token.issuer https://example.com/issuer
      spark.hadoop.privacera.jwt.2.token.publickey /tmp/jwttoken1.pub
      
    • Save the changes and click on Start or, if the cluster is running, click on Confirm and Restart.

Validation

  1. Prerequisites:
  2. A running Databricks cluster secured with the above steps.
  3. Steps to Validate:
  4. Login to Databricks.
  5. Create or open an existing notebook. Associate the Notebook with the running Databricks cluster.
  6. To use JWT in Privacera Databricks integration, you need to copy the JWT token file or string to the cluster's local file. To do this, use the following commands and replace <jwt_token> with your actual jwt token value.
    Python
    1
    2
    3
    4
    5
    6
    7
    8
    9
    jwt_file_path="/tmp/jwttoken.dat"
    token="<jwt_token>"
    file1 = open(jwt_file_path,"w")
    file1.write(token)
    file1.close()
    
    # Check the file content
    f = open(jwt_file_path,"r")
    print(f.read())
    
    • Use the following PySpark commands to verify S3 CSV file read access.
      Python
      1
      2
      3
      4
      5
      6
      7
      8
      # Define the S3 path to your file
      s3_path = "s3a://your-bucket-name/path/to/your/file"
      
      # Read the CSV file from the specified S3 path
      df = spark.read.format("csv").option("header", "true").load(s3_path)
      
      # Display the first 5 rows of the dataframe
      df.show(5)
      
    • On the Privacera portal, go to Access Management -> Audits
    • Check for the User that you mentioned in the Payload while Creating the JWT Token, e.g., jwt_user.
    • Check for the success or failure of the resource policy. A successful access is indicated as Allowed and a failure is indicated as Denied.

Comments