Skip to content

Troubleshooting for Access Management for Databricks all-purpose compute clusters with Fine-Grained Access Control (FGAC)

Steps to get installation logs and version file

Here are the steps to get Installation logs and Privacera plugin version files from DBFS:

  • Privacera init script generates two files in DBFS at the location dbfs:/privacera/cluster-logs/<CLUSTER_NAME>/
  • Commands to list files from dbfs location

    Bash
    dbfs ls dbfs:/privacera/cluster-logs/<CLUSTER_NAME>/
    

    • Folder will have two files
      • privacera.out : Installation log
      • privacera_version.txt : Privacera plugin version details.
  • Command to get files on local:

    Bash
    dbfs cp dbfs:/privacera/cluster-logs/<CLUSTER_NAME>/  . --recursive
    

Steps to enable logs for Databricks cluster

Perform the following steps to enable debug logs for com.privacera package:

  1. Prerequisites:
    • A running Databricks cluster.
  2. Log in to Databricks Web UI
  3. Click on the Workspace icon on the sidebar
  4. Navigate to folder where you want to create the debug script.
  5. Click on Create icon on top right side, then click on File, set file name as debug.sh and add below content to the file:

    Enable debug logs in Databricks Runtime Version 10.4 LTS

    Note: ROOT_LOG_LEVEL, PRIVACERA_LOG_LEVEL, RANGER_LOG_LEVEL can be set to INFO/DEBUG/TRACE as per usage.

    debug.sh
    #!/bin/bash
    
    PRIVACERA_OUT_FILE=/root/privacera/privacera.out
    PRIVACERA_CLUSTER_LOGS_DIR=${PRIVACERA_CLUSTER_LOGS_DIR:-/dbfs/privacera/cluster-logs/${DB_CLUSTER_NAME}}
    ROOT_LOG_LEVEL=INFO
    PRIVACERA_LOG_LEVEL=DEBUG
    RANGER_LOG_LEVEL=INFO
    
    function log(){
      msg=$1
      currentTime=`date`
      echo "${currentTime} : ${msg} " >> ${PRIVACERA_OUT_FILE}
    }
    
    log "======================debug.sh execution started!!!======================"
    log "enabling ${PRIVACERA_LOG_LEVEL} logs in driver...."
    DRIVER_LOG4J_CONF_FILE=/databricks/spark/dbconf/log4j/driver/log4j.properties
    echo "#Privacera HERE " >> ${DRIVER_LOG4J_CONF_FILE}
    echo "log4j.rootLogger=${ROOT_LOG_LEVEL},publicFile" >> ${DRIVER_LOG4J_CONF_FILE}
    echo "log4j.category.com.privacera=${PRIVACERA_LOG_LEVEL},publicFile" >> ${DRIVER_LOG4J_CONF_FILE}
    echo "log4j.additivity.com.privacera=false" >> ${DRIVER_LOG4J_CONF_FILE}
    echo "log4j.category.org.apache.ranger=${RANGER_LOG_LEVEL},publicFile" >> ${DRIVER_LOG4J_CONF_FILE}
    echo "log4j.additivity.org.apache.ranger=false" >> ${DRIVER_LOG4J_CONF_FILE}
    echo "log4j.appender.publicFile.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss,SSS} %p %c:%L [%t] - %m%n" >> ${DRIVER_LOG4J_CONF_FILE}
    log "enabled ${PRIVACERA_LOG_LEVEL} logs in driver successfully!!"
    
    log "enabling ${PRIVACERA_LOG_LEVEL} logs in executor...."
    EXECUTOR_LOG4J_CONF_FILE=/databricks/spark/dbconf/log4j/executor/log4j.properties
    echo "#Privacera HERE " >> ${EXECUTOR_LOG4J_CONF_FILE}
    echo "log4j.rootLogger=${ROOT_LOG_LEVEL}, console" >> ${EXECUTOR_LOG4J_CONF_FILE}
    echo "log4j.category.com.privacera=${PRIVACERA_LOG_LEVEL},console" >> ${EXECUTOR_LOG4J_CONF_FILE}
    echo "log4j.additivity.com.privacera=false" >> ${EXECUTOR_LOG4J_CONF_FILE}
    echo "log4j.category.org.apache.ranger=${RANGER_LOG_LEVEL},console" >> ${EXECUTOR_LOG4J_CONF_FILE}
    echo "log4j.additivity.org.apache.ranger=false" >> ${EXECUTOR_LOG4J_CONF_FILE}
    echo "log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss,SSS} %p %c:%L [%t] - %m%n" >> ${EXECUTOR_LOG4J_CONF_FILE}
    echo "#Privacera HERE " >> ${EXECUTOR_LOG4J_CONF_FILE}
    log "enabled ${PRIVACERA_LOG_LEVEL} logs in executor successfully!!"
    log "======================debug.sh execution completed!!!======================"
    
    log "Copying privacera.out to ${PRIVACERA_CLUSTER_LOGS_DIR}"
    cp ${PRIVACERA_OUT_FILE} ${PRIVACERA_CLUSTER_LOGS_DIR}/privacera.out
    
    Enable debug logs in Databricks Runtime Version 11.3 LTS and above

    Note: ROOT_LOG_LEVEL, PRIVACERA_LOG_LEVEL, RANGER_LOG_LEVEL can be set to INFO/DEBUG/TRACE as per usage.

    debug.sh
    #!/bin/bash
    
    PRIVACERA_OUT_FILE=/root/privacera/privacera.out
    PRIVACERA_CLUSTER_LOGS_DIR=${PRIVACERA_CLUSTER_LOGS_DIR:-/dbfs/privacera/cluster-logs/${DB_CLUSTER_NAME}}
    ROOT_LOG_LEVEL="INFO"
    PRIVACERA_LOG_LEVEL="DEBUG"
    RANGER_LOG_LEVEL="INFO"
    
    function log(){
      msg=$1
      currentTime=`date`
      echo "${currentTime} : ${msg} " >> ${PRIVACERA_OUT_FILE}
    }
    
    log "======================debug.sh execution started!!!======================"
    log "enabling logs in driver...."
    LOG4J_2_DRIVER_FILE_PATH="/databricks/spark/dbconf/log4j/driver/log4j2.xml"
    cat << EOF >  ${LOG4J_2_DRIVER_FILE_PATH}
    <?xml version="1.0" encoding="UTF-8"?><Configuration status="INFO" packages="com.databricks.logging" shutdownHook="disable">
      <Appenders>
        <RollingFile name="publicFile.rolling" fileName="logs/log4j-active.log" filePattern="logs/log4j-%d{yyyy-MM-dd-HH}.log.gz" immediateFlush="true" bufferedIO="false" bufferSize="8192" createOnDemand="true">
          <Policies>
            <TimeBasedTriggeringPolicy/>
          </Policies>
          <PatternLayout pattern="%d{yy/MM/dd HH:mm:ss,SSS} %p %c:%L [%t] - %m%n%ex"/>
        </RollingFile>
        <Rewrite name="publicFile.rolling.rewrite">
          <ServiceRewriteAppender/>
          <AppenderRef ref="publicFile.rolling"/>
        </Rewrite>
        <RollingFile name="privateFile.rolling" fileName="logs/active.log" filePattern="logs/%d{yyyy-MM-dd-HH}.log.gz" immediateFlush="true" bufferedIO="false" bufferSize="8192" createOnDemand="true">
          <Policies>
            <TimeBasedTriggeringPolicy/>
          </Policies>
          <PatternLayout pattern="%d{yy/MM/dd HH:mm:ss,SSS} %p %c{1}: %m%n%ex"/>
        </RollingFile>
        <Rewrite name="privateFile.rolling.rewrite">
          <ServiceRewriteAppender/>
          <AppenderRef ref="privateFile.rolling"/>
        </Rewrite>
        <RollingFile name="com.databricks.UsageLogging.appender" fileName="logs/usage.json" filePattern="logs/%d{yyyy-MM-dd-HH}.usage.json.gz" immediateFlush="true" bufferedIO="false" bufferSize="8192" createOnDemand="true">
          <Policies>
            <TimeBasedTriggeringPolicy/>
          </Policies>
          <PatternLayout pattern="%m%n%ex"/>
        </RollingFile>
        <RollingFile name="com.databricks.ProductLogging.appender" fileName="logs/product.json" filePattern="logs/%d{yyyy-MM-dd-HH}.product.json.gz" immediateFlush="true" bufferedIO="false" bufferSize="8192" createOnDemand="true">
          <Policies>
            <TimeBasedTriggeringPolicy/>
          </Policies>
          <PatternLayout pattern="%m%n%ex"/>
        </RollingFile>
        <RollingFile name="com.databricks.LineageLogging.appender" fileName="logs/lineage.json" filePattern="logs/%d{yyyy-MM-dd-HH}.lineage.json.gz" immediateFlush="true" bufferedIO="false" bufferSize="8192" createOnDemand="true">
          <Policies>
            <TimeBasedTriggeringPolicy/>
          </Policies>
          <PatternLayout pattern="%m%n%ex"/>
        </RollingFile>
        <RollingFile name="com.databricks.MetricsLogging.appender" fileName="logs/metrics.json" filePattern="logs/%d{yyyy-MM-dd-HH}.metrics.json.gz" immediateFlush="true" bufferedIO="false" bufferSize="8192" createOnDemand="true">
          <Policies>
            <TimeBasedTriggeringPolicy/>
          </Policies>
          <PatternLayout pattern="%m%n%ex"/>
        </RollingFile>
        <RollingFile name="dltExecution.rolling" fileName="logs/dlt-execution.log" filePattern="logs/dlt-execution-%d{yyyy-MM-dd-HH}.log.gz" immediateFlush="true" bufferedIO="false" bufferSize="8192" createOnDemand="true">
          <Policies>
            <TimeBasedTriggeringPolicy/>
          </Policies>
          <PatternLayout pattern="%d{yy/MM/dd HH:mm:ss,SSS} %p %c:%L [%t] - %m%n%ex"/>
        </RollingFile>
        <Rewrite name="dltExecution.rolling.rewrite">
          <ServiceRewriteAppender/>
          <AppenderRef ref="dltExecution.rolling"/>
        </Rewrite>
      </Appenders>
    
      <Loggers>
        <Root level="${ROOT_LOG_LEVEL}">
          <AppenderRef ref="publicFile.rolling.rewrite"/>
        </Root>
        <Logger name="privateLog" level="INFO" additivity="false">
          <AppenderRef ref="privateFile.rolling.rewrite"/>
        </Logger>
        <Logger name="com.databricks.UsageLogging" level="INFO" additivity="false">
          <AppenderRef ref="com.databricks.UsageLogging.appender"/>
        </Logger>
        <Logger name="com.databricks.ProductLogging" level="INFO" additivity="false">
          <AppenderRef ref="com.databricks.ProductLogging.appender"/>
        </Logger>
        <Logger name="com.databricks.LineageLogging" level="INFO" additivity="false">
          <AppenderRef ref="com.databricks.LineageLogging.appender"/>
        </Logger>
        <Logger name="com.databricks.MetricsLogging" level="INFO" additivity="false">
          <AppenderRef ref="com.databricks.MetricsLogging.appender"/>
        </Logger>
        <Logger name="com.databricks.pipelines" level="INFO" additivity="true">
          <AppenderRef ref="dltExecution.rolling.rewrite"/>
        </Logger>
        <Logger name="org.apache.spark.rdd.NewHadoopRDD" level="WARN"/>
        <Logger name="com.microsoft.azure.datalake.store" level="DEBUG"/>
        <Logger name="com.microsoft.azure.datalake.store.HttpTransport" level="DEBUG">
          <RegexFilter onMatch="DENY" onMismatch="NEUTRAL" regex=".*HTTPRequest,Succeeded.*"/>
        </Logger>
        <Logger name="com.microsoft.azure.datalake.store.HttpTransport.tokens" level="DEBUG"/>
        <!-- privacera -->
        <Logger name="com.privacera" level="${PRIVACERA_LOG_LEVEL}" additivity="false">
          <AppenderRef ref="publicFile.rolling.rewrite" />
        </Logger>
        <Logger name="org.apache.ranger" level="${RANGER_LOG_LEVEL}" additivity="false">
          <AppenderRef ref="publicFile.rolling.rewrite" />
        </Logger>
      </Loggers>
    
    </Configuration>
    EOF
    log "enabled logs in driver successfully!!"
    
    log "enabling logs in executor...."
    LOG4J_2_EXECUTOR_FILE_PATH="/databricks/spark/dbconf/log4j/executor/log4j2.xml"    
    cat << EOF >  ${LOG4J_2_EXECUTOR_FILE_PATH}
    <?xml version="1.0" encoding="UTF-8"?><Configuration status="INFO" packages="com.databricks.logging" shutdownHook="disable">
      <Appenders>
        <Console name="console" target="SYSTEM_ERR">
        <PatternLayout pattern="%d{yy/MM/dd HH:mm:ss,SSS} %p %c:%L [%t] - %m%n%ex"/>
        </Console>
        <Rewrite name="console.rewrite">
          <ServiceRewriteAppender/>
          <AppenderRef ref="console"/>
        </Rewrite>
        <RollingFile name="com.databricks.UsageLogging.appender" fileName="logs/usage.json" filePattern="logs/%d{yyyy-MM-dd-HH}.usage.json.gz" immediateFlush="true" bufferedIO="false" bufferSize="8192" createOnDemand="true">
          <Policies>
            <TimeBasedTriggeringPolicy/>
          </Policies>
          <PatternLayout pattern="%m%n%ex"/>
        </RollingFile>
    
        <RollingFile name="publicFile.rolling" fileName="logs/log4j-active.log" filePattern="logs/log4j-%d{yyyy-MM-dd-HH}.log.gz" immediateFlush="true" bufferedIO="false" bufferSize="8192" createOnDemand="true">
          <Policies>
            <TimeBasedTriggeringPolicy/>
          </Policies>
          <PatternLayout pattern="%d{yy/MM/dd HH:mm:ss,SSS} %p %c:%L [%t] - %m%n%ex"/>
        </RollingFile>
        <Rewrite name="publicFile.rolling.rewrite">
          <ServiceRewriteAppender/>
          <AppenderRef ref="publicFile.rolling"/>
        </Rewrite>
      </Appenders>
    
      <Loggers>
        <Logger name="org.apache.spark.rdd.NewHadoopRDD" level="WARN"/>
        <Logger name="com.microsoft.azure.datalake.store" level="DEBUG"/>
        <Logger name="com.microsoft.azure.datalake.store.HttpTransport" level="DEBUG">
          <RegexFilter onMatch="DENY" onMismatch="NEUTRAL" regex=".*HTTPRequest,Succeeded.*"/>
        </Logger>
        <Logger name="com.microsoft.azure.datalake.store.HttpTransport.tokens" level="DEBUG"/>
        <Logger name="com.databricks.UsageLogging" level="INFO" additivity="false">
          <AppenderRef ref="com.databricks.UsageLogging.appender"/>
        </Logger>
        <Root level="${ROOT_LOG_LEVEL}">
          <AppenderRef ref="console.rewrite"/>
          <AppenderRef ref="publicFile.rolling.rewrite"/>
        </Root>
    
        <!-- privacera -->
        <Logger name="com.privacera" level="${PRIVACERA_LOG_LEVEL}" additivity="false">
          <AppenderRef ref="console.rewrite" />
          <AppenderRef ref="publicFile.rolling.rewrite"/>
        </Logger>
        <Logger name="org.apache.ranger" level="${RANGER_LOG_LEVEL}" additivity="false">
          <AppenderRef ref="console.rewrite" />
          <AppenderRef ref="publicFile.rolling.rewrite"/>
        </Logger>
      </Loggers>
    
    </Configuration>
    EOF
    log "enabled logs in executor successfully!!"
    log "======================debug.sh execution completed!!!======================"
    
    log "Copying privacera.out to ${PRIVACERA_CLUSTER_LOGS_DIR}"
    cp ${PRIVACERA_OUT_FILE} ${PRIVACERA_CLUSTER_LOGS_DIR}/privacera.out
    
  6. Save debug.sh file.

  7. Now click on the Compute icon on the sidebar, then click on the cluster where logs need to be enabled and Edit this cluster.
  8. Scroll down to Advanced option and click on Init Scripts.
    • Select Workspace as the source for init scripts.
    • Specify the debug.sh script path that created in step 5, after plugin scripts (ranger_enable.sh for FGAC and ranger_enable_scala.sh for OLAC), which must be added as first script.
  9. Click the Add button.
  10. Now in Logging section, define path to store logs

    • Select DBFS as Destination for logs folder.
    • in Log Path add the path, e.g. dbfs:/logs/<cluster-name>/<folder>
  11. Click Confirm to save changes.

  12. Click on Start to restart the cluster.
  13. To copy logs on local machine, run below command:
    Bash
    dbfs cp -r dbfs:/logs/<cluster-name>/<folder> <local-folder>
    
  14. Once the cluster is started, Check privacera.out logs for validation of debug.sh script execution. Refer here for more details.

Steps to deploy custom build in Databricks cluster

Here are the steps to deploy custom build in databricks cluster:

  1. Prerequisites:
    • A running Databricks cluster.
  2. Log in to Databricks Web UI
  3. Click on the Workspace icon on the sidebar
  4. Navigate to folder where you want to create the custom build script.
  5. Click on Create icon on top right side, then click on File, set file name as custom_build.sh and add below content to the file:

    Script to deploy custom build in Databricks cluster
    custom_build.sh
    #!/bin/bash
    
    PRIVACERA_OUT_FILE=/root/privacera/privacera.out
    PRIVACERA_CLUSTER_LOGS_DIR=${PRIVACERA_CLUSTER_LOGS_DIR:-/dbfs/privacera/cluster-logs/${DB_CLUSTER_NAME}}
    
    LOG_LEVEL=INFO
    function log(){
      msg=$1
      currentTime=`date`
      echo "${currentTime} : ${msg} " >> ${PRIVACERA_OUT_FILE}
    }
    
    log "======================custom_build.sh execution started!!!======================"
    if [[ -z "${CUSTOM_BUILD_PKG}" ]]; then
      log "Error: CUSTOM_BUILD_PKG is not set or is empty. Please provide a valid URL in Environments Variables."
      exit 1
    fi
    
    log "Downloading custom Privacera Spark Plugin from ${CUSTOM_BUILD_PKG}..."
    
    log "Creating temporary directory for the custom build..."
    mkdir -p /tmp/custom
    cd /tmp/custom
    
    wget ${CUSTOM_BUILD_PKG} -O privacera-spark-plugin.tar.gz
    if [[ $? -ne 0 ]]; then
      log "Error: Failed to download the package from ${CUSTOM_BUILD_PKG}. Please check the URL."
      exit 1
    fi
    
    tar -xzf privacera-spark-plugin.tar.gz
    if [[ $? -ne 0 ]]; then
      log "Error: Failed to extract privacera-spark-plugin.tar.gz."
      exit 1
    fi
    
    OLD_MD5_SUM=$(md5sum /databricks/jars/privacera-agent.jar | awk '{print $1}')
    log "md5 checksum of the existing privacera-agent.jar: ${OLD_MD5_SUM}"
    log "Removing existing Privacera and Ranger jars..."
    rm -r /databricks/jars/ranger-* /databricks/jars/privacera-agent.jar
    
    log "Copying new jars to /databricks/jars/..."
    cp -r spark-plugin/* /databricks/jars/
    
    MD5_SUM=$(md5sum /databricks/jars/privacera-agent.jar | awk '{print $1}')
    log "md5 checksum of the privacera-agent.jar: ${MD5_SUM}" 
    log "Deployed custom build successfully."
    log "======================custom_build.sh execution completed!!!======================"
    
    log "Copying privacera.out to ${PRIVACERA_CLUSTER_LOGS_DIR}"
    cp ${PRIVACERA_OUT_FILE} ${PRIVACERA_CLUSTER_LOGS_DIR}/privacera.out
    
  6. Save custom_build.sh file.

  7. Now click on the Compute icon on the sidebar, then click on the cluster where logs need to be enabled and Edit this cluster.
  8. Scroll down to Advanced option and click on Init Scripts.
    • Select Workspace as the source for init scripts.
    • Specify the custom_build.sh script path that created in step 5, after plugin scripts (ranger_enable.sh for FGAC and ranger_enable_scala.sh for OLAC), which must be added as first script.
  9. Click the Add button.
  10. Now navigate to Spark tab, in Environment variables section, add below environment variable:
    Bash
    CUSTOM_BUILD_PKG=<Custom package URL provided by privacera>
    
  11. Click Confirm to save changes.
  12. Click on Start to restart the cluster.
  13. Once the cluster is started, Check privacera.out logs for validation of custom build deployment. Refer here for more details.

Steps to Skip the Access Check for the Databricks Cluster Libraries Path

When attaching a library to a Databricks cluster, the plugin performs an access check on the library's path. If the required permissions are missing, the library installation fails.

To skip this access check, follow the steps below:

  1. Go to the Databricks cluster where you want to attach the libraries.
  2. Click the Edit button in cluster.
  3. Under the Advanced options navigate to the Spark tab.
  4. In the Spark config section, add the following Spark property to skip the access check for the Databricks library path:

    Bash
    spark.hadoop.privacera.fgac.file.ignore.path dbfs:/local_disk*
    

    • If this property is already configured with other paths, append dbfs:/local_disk* to the existing list using a comma separator. For example:
      Bash
      spark.hadoop.privacera.fgac.file.ignore.path s3://existing-bucket-1,s3://existing-bucket-2,dbfs:/local_disk*
      
  5. Click the Confirm button to save the changes.

  6. Navigate to the Libraries tab to verify if libraries are already attached to the cluster.
  7. Click Install New to add a new library.
  8. Choose the appropriate Library Source and select the desired library.
  9. Click Install to install the library.
  10. Click Restart to restart the cluster
  11. With the Spark property in place, the plugin will skip access checks on the specified cluster library path.
  12. Once the cluster restarts, check the Libraries tab. The library status should display as Installed.

Handling LogicalPlan Filter Conflicts in Spark FGAC Plugin

Warning

Disabling this property is not recommended when using Privacera Row Level Filter policies.

In Spark FGAC (Fine-Grained Access Control) use cases, the plugin transforms the LogicalPlan in multiple phases using custom Scala rules. These transformations are implemented to support scenarios such as view-based access control, row level filtering (RLF) and column masking.

However, in certain cases, the incoming LogicalPlan may contain multiple filter conditions. This can lead to incorrect evaluation by the plugin, potentially causing query failures due to runtime exceptions.

For instance, the plugin might throw a ClassCastException if it encounters a filter condition that is incompatible with the expected data type. Such issues typically arise when a filter is applied to a column that has been transformed or implicitly cast to a different type.

To troubleshoot or mitigate this issue, you can disable the plugin's modification of the LogicalPlan. This can be done by following below steps:

  1. Go to the Databricks cluster where you want to attach the libraries.
  2. Click the Edit button in cluster.
  3. Under the Advanced options navigate to the Spark tab.
  4. In the Spark config section, add the following Spark property to skip the plugin’s transformation logic for Filter conditions.
    Bash
    spark.hadoop.privacera.fgac.wa.partition.filter.enable false
    
  5. Click the Confirm button to save the changes.
  6. Click Restart to restart the cluster.

Tip

If the Privacera Row Level Filter policy is enabled and the issue persists, enable the Enhanced Extension. For configuration steps, refer to the advanced configurations here

Comments