Skip to main content

Privacera Platform

Databricks Spark Object-level Access Control Plugin [OLAC] [Scala]
:
Prerequisites

Ensure the following prerequisites are met:

Configuration
  1. Run the following commands.

    cd ~/privacera/privacera-manager/
    cp config/sample-vars/vars.databricks.scala.yml config/custom-vars/
    vi config/custom-vars/vars.databricks.scala.yml
    
  2. Edit the following properties. For property details and description, refer to the Configuration Properties below.

    DATASERVER_DATABRICKS_ALLOWED_URLS : "<PLEASE_UPDATE>"
    DATASERVER_AWS_STS_ROLE: "<PLEASE_CHANGE>"
    
  3. Run the following commands.

    cd ~/privacera/privacera-manager
    ./privacera-manager.sh update
    
Configuration properties

Property

Description

Example

DATABRICKS_SCALA_ENABLE

Set the property to enable/disable Databricks Scala. This is found under Databricks Signed URL Configuration For Scala Clusters section.

DATASERVER_DATABRICKS_ALLOWED_URLS

Add a URL or comma-separated URLs.

Privacera Dataserver serves only those URLs mentioned in this property.

https://xxx-7xxxfaxx-xxxx.cloud.databricks.com

DATASERVER_AWS_STS_ROLE

Add the instance profile ARN of the AWS role, which can access Delta Files in Databricks.

arn:aws:iam::111111111111:role/assume-role

DATABRICKS_SCALA_CLUSTER_POLICY_SPARK_CONF

Configure Databricks Cluster policy.

Add the following JSON in the text area:

[{"Note":"First spark conf",
"key":"spark.hadoop.first.spark.test",
"value":"test1"},
{"Note":"Second spark conf",
"key":"spark.hadoop.first.spark.test",
"value":"test2"}]
Managing init script

Automatic Upload

If DATABRICKS_ENABLE is 'true' and DATABRICKS_MANAGE_INIT_SCRIPT is "true", the Init script will be uploaded automatically to your Databricks host. The Init Script will be uploaded to dbfs:/privacera/<DEPLOYMENT_ENV_NAME>/ranger_enable_scala.sh, where <DEPLOYMENT_ENV_NAME> is the value of DEPLOYMENT_ENV_NAME mentioned in vars.privacera.yml.

Manual Upload

If DATABRICKS_ENABLE is 'true' and DATABRICKS_MANAGE_INIT_SCRIPT is "false" the Init script must be uploaded to your Databricks host.

  1. Open a terminal and connect to Databricks account using your Databricks login credentials/token.

    • Connect using login credentials:

      1. If you're using login credentials, then run the following command.

        databricks configure --profile privacera
        
      2. Enter the Databricks URL.

        Databricks Host (should begin with https://): https://dbc-xxxxxxxx-xxxx.cloud.databricks.com/
        
      3. Enter the username and password.

        Username: email-id@yourdomain.com
        Password:
        
    • Connect using Databricks token:

      1. If you don't have a Databricks token, you can generate one. For more information, refer Generate a personal access token.

      2. If you're using token, then run the following command.

        databricks configure --token --profile privacera
        
      3. Enter the Databricks URL.

        Databricks Host (should begin with https://): https://dbc-xxxxxxxx-xxxx.cloud.databricks.com/
        
      4. Enter the token.

        Token:
        
  2. To check if the connection to your Databricks account is established, run the following command.

    dbfs ls dbfs:/ --profile privacera
    

    You should see the list of files in the output, if you are connected to your account.

  3. Upload files manually to Databricks.

    1. Copy the following files to DBFS, which are available in the PM host at the location, ~/privacera/privacera-manager/output/databricks:

      • ranger_enable_scala.sh

      • privacera_spark_scala_plugin.conf

      • privacera_spark_scala_plugin_job.conf

    2. Run the following command. For the value of <DEPLOYMENT_ENV_NAME>, you can get it from the file, ~/privacera/privacera-manager/config/vars.privacera.yml.

      export DEPLOYMENT_ENV_NAME=<DEPLOYMENT_ENV_NAME>
      dbfs mkdirs dbfs:/privacera/${DEPLOYMENT_ENV_NAME} --profile privacera
      dbfs cp ranger_enable_scala.sh dbfs:/privacera/${DEPLOYMENT_ENV_NAME}/ --profile privacera
      dbfs cp privacera_spark_scala_plugin.conf dbfs:/privacera/${DEPLOYMENT_ENV_NAME}/ --profile privacera
      dbfs cp privacera_spark_scala_plugin_job.conf dbfs:/privacera/${DEPLOYMENT_ENV_NAME}/ --profile privacera
      
    3. Verify the files have been uploaded.

      dbfs ls dbfs:/privacera/${DEPLOYMENT_ENV_NAME}/ --profile privacera
      

      The Init Script is uploaded to dbfs:/privacera/<DEPLOYMENT_ENV_NAME>/ranger_enable_scala.sh, where <DEPLOYMENT_ENV_NAME> is the value of DEPLOYMENT_ENV_NAME mentioned in vars.privacera.yml.

Configure Databricks cluster
  1. Once the update completes successfully, log on to the Databricks console with your account and open the target cluster, or create a new target cluster.

  2. Open the Cluster dialog. enter Edit mode.

  3. In the Configuration tab, in Edit mode, Open Advanced Options (at the bottom of the dialog) and then the Spark tab.

  4. Add the following content to the Spark Config edit box. For more information on the Spark config properties, click here.

    New Properties

    spark.databricks.isv.product privacera
    spark.driver.extraJavaOptions -javaagent:/databricks/jars/privacera-agent.jar
    spark.executor.extraJavaOptions -javaagent:/databricks/jars/privacera-agent.jar
    spark.databricks.repl.allowedLanguages sql,python,r,scala
    spark.databricks.delta.formatCheck.enabled false
    

    Old Properties

    spark.databricks.cluster.profile serverless
    spark.databricks.delta.formatCheck.enabled false
    spark.driver.extraJavaOptions -javaagent:/databricks/jars/ranger-spark-plugin-faccess-2.0.0-SNAPSHOT.jar
    spark.executor.extraJavaOptions -javaagent:/databricks/jars/ranger-spark-plugin-faccess-2.0.0-SNAPSHOT.jar
    spark.databricks.isv.product privaceraspark.databricks.repl.allowedLanguages sql,python,r,scala
    

    Note

    • From Privacera 5.0.6.1 Release onwards, it is recommended to replace the Old Properties with the New Properties. However, the Old Properties will also continue to work.

    • For Databricks versions &lt; 7.3, Old Properties should only be used since the versions are in extended support.

  5. (Optional) To use regional endpoint for S3 access, add the following content to the Spark Config edit box.

    spark.hadoop.fs.s3a.endpoint https://s3.<region>.amazonaws.com
    spark.hadoop.fs.s3.endpoint https://s3.<region>.amazonaws.com
    spark.hadoop.fs.s3n.endpoint https://s3.<region>.amazonaws.com
    
  6. In the Configuration tab, in Edit mode, Open Advanced Options (at the bottom of the dialog) and then set init script path. For the <DEPLOYMENT_ENV_NAME> variable, enter the deployment name as defined for the DEPLOYMENT_ENV_NAME variable in the vars.privacera.yml.

    dbfs:/privacera/<DEPLOYMENT_ENV_NAME>/ranger_enable_scala.sh
    
  7. Save (Confirm) this configuration.

  8. Start (or Restart) the selected Databricks Cluster.

Related information

For further reading, see: