Skip to content

Troubleshooting for Access Management for Databricks all-purpose compute clusters with Fine-Grained Access Control (FGAC)

Steps to get installation logs and version file

Here are the steps to get Installation logs and Privacera plugin version files from DBFS:

  • Privacera init script generates two files in DBFS at the location dbfs:/privacera/cluster-logs/<CLUSTER_NAME>/
  • Commands to list files from dbfs location

    Bash
    dbfs ls dbfs:/privacera/cluster-logs/<CLUSTER_NAME>/
    

    • Folder will have two files
      • privacera.out : Installation log
      • privacera_version.txt : Privacera plugin version details.
  • Command to get files on local:

    Bash
    dbfs cp dbfs:/privacera/cluster-logs/<CLUSTER_NAME>/  . --recursive
    

Steps to enable logs for Databricks cluster

Here are the steps to enable logs for com.privacera package:

  1. Prerequisites:
    • A running Databricks cluster.
  2. Log in to Databricks Web UI
  3. Click on the Workspace icon on the sidebar
  4. Navigate to folder where you want to create the debug script.
  5. Click on Create icon on top right side, then click on File, set file name as debug.sh and add below content to the file:

    Enable logs in Databricks Runtime Version 10.4 LTS

    Note: LOG_LEVEL can be set to INFO/DEBUG/TRACE as per usage.

    debug.sh
    #!/bin/bash
    
    PRIVACERA_OUT_FILE=/root/privacera/privacera.out
    PRIVACERA_CLUSTER_LOGS_DIR=${PRIVACERA_CLUSTER_LOGS_DIR:-/dbfs/privacera/cluster-logs/${DB_CLUSTER_NAME}}
    LOG_LEVEL=INFO 
    
    function log(){
      msg=$1
      currentTime=`date`
      echo "${currentTime} : ${msg} " >> ${PRIVACERA_OUT_FILE}
    }
    
    log "======================debug.sh execution started!!!======================"
    log "enabling ${LOG_LEVEL} logs in driver...."
    DRIVER_LOG4J_CONF_FILE=/databricks/spark/dbconf/log4j/driver/log4j.properties
    echo "#Privacera HERE " >> ${DRIVER_LOG4J_CONF_FILE}
    echo "log4j.rootLogger=INFO,publicFile" >> ${DRIVER_LOG4J_CONF_FILE}
    echo "log4j.category.com.privacera=${LOG_LEVEL},publicFile" >> ${DRIVER_LOG4J_CONF_FILE}
    echo "log4j.additivity.com.privacera=false" >> ${DRIVER_LOG4J_CONF_FILE}
    echo "log4j.category.org.apache.ranger=ALL,publicFile" >> ${DRIVER_LOG4J_CONF_FILE}
    echo "log4j.additivity.org.apache.ranger=false" >> ${DRIVER_LOG4J_CONF_FILE}
    echo "log4j.appender.publicFile.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss,SSS} %p %c:%L [%t] - %m%n" >> /databricks/spark/dbconf/log4j/driver/log4j.properties
    log "enabled ${LOG_LEVEL} logs in driver successfully!!"
    
    log "enabling ${LOG_LEVEL} logs in executor...."
    EXECUTOR_LOG4J_CONF_FILE=/databricks/spark/dbconf/log4j/executor/log4j.properties
    echo "#Privacera HERE " >> ${EXECUTOR_LOG4J_CONF_FILE}
    echo "log4j.rootLogger=INFO, console" >> ${EXECUTOR_LOG4J_CONF_FILE}
    echo "log4j.category.com.privacera=${LOG_LEVEL},console" >> ${EXECUTOR_LOG4J_CONF_FILE}
    echo "log4j.additivity.com.privacera=false" >> ${EXECUTOR_LOG4J_CONF_FILE}
    echo "log4j.category.org.apache.ranger=WARN,console" >> ${EXECUTOR_LOG4J_CONF_FILE}
    echo "log4j.additivity.org.apache.ranger=false" >> ${EXECUTOR_LOG4J_CONF_FILE}
    echo "log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss,SSS} %p %c:%L [%t] - %m%n" >> ${EXECUTOR_LOG4J_CONF_FILE}
    echo "#Privacera HERE " >> ${EXECUTOR_LOG4J_CONF_FILE}
    log "enabled ${LOG_LEVEL} logs in executor successfully!!"
    log "======================debug.sh execution completed!!!======================"
    
    log "Copying privacera.out to ${PRIVACERA_CLUSTER_LOGS_DIR}"
    cp ${PRIVACERA_OUT_FILE} ${PRIVACERA_CLUSTER_LOGS_DIR}/privacera.out
    
    Enable debug logs in Databricks Runtime Version 11.3 LTS and above

    Note: In field <Logger name="com.privacera" level="<log-level>" additivity="false"> log level can be set to INFO/DEBUG/TRACE as per usage.

    debug.sh
    #!/bin/bash
    
    PRIVACERA_OUT_FILE=/root/privacera/privacera.out
    PRIVACERA_CLUSTER_LOGS_DIR=${PRIVACERA_CLUSTER_LOGS_DIR:-/dbfs/privacera/cluster-logs/${DB_CLUSTER_NAME}}
    
    function log(){
      msg=$1
      currentTime=`date`
      echo "${currentTime} : ${msg} " >> ${PRIVACERA_OUT_FILE}
    }
    
    log "======================debug.sh execution started!!!======================"
    log "enabling logs in driver...."
    LOG4J_2_DRIVER_FILE_PATH="/databricks/spark/dbconf/log4j/driver/log4j2.xml"
    cat << 'EOF' >  ${LOG4J_2_DRIVER_FILE_PATH}
    <?xml version="1.0" encoding="UTF-8"?><Configuration status="INFO" packages="com.databricks.logging" shutdownHook="disable">
      <Appenders>
        <RollingFile name="publicFile.rolling" fileName="logs/log4j-active.log" filePattern="logs/log4j-%d{yyyy-MM-dd-HH}.log.gz" immediateFlush="true" bufferedIO="false" bufferSize="8192" createOnDemand="true">
          <Policies>
            <TimeBasedTriggeringPolicy/>
          </Policies>
          <PatternLayout pattern="%d{yy/MM/dd HH:mm:ss,SSS} %p %c:%L [%t] - %m%n%ex"/>
        </RollingFile>
        <Rewrite name="publicFile.rolling.rewrite">
          <ServiceRewriteAppender/>
          <AppenderRef ref="publicFile.rolling"/>
        </Rewrite>
        <RollingFile name="privateFile.rolling" fileName="logs/active.log" filePattern="logs/%d{yyyy-MM-dd-HH}.log.gz" immediateFlush="true" bufferedIO="false" bufferSize="8192" createOnDemand="true">
          <Policies>
            <TimeBasedTriggeringPolicy/>
          </Policies>
          <PatternLayout pattern="%d{yy/MM/dd HH:mm:ss,SSS} %p %c{1}: %m%n%ex"/>
        </RollingFile>
        <Rewrite name="privateFile.rolling.rewrite">
          <ServiceRewriteAppender/>
          <AppenderRef ref="privateFile.rolling"/>
        </Rewrite>
        <RollingFile name="com.databricks.UsageLogging.appender" fileName="logs/usage.json" filePattern="logs/%d{yyyy-MM-dd-HH}.usage.json.gz" immediateFlush="true" bufferedIO="false" bufferSize="8192" createOnDemand="true">
          <Policies>
            <TimeBasedTriggeringPolicy/>
          </Policies>
          <PatternLayout pattern="%m%n%ex"/>
        </RollingFile>
        <RollingFile name="com.databricks.ProductLogging.appender" fileName="logs/product.json" filePattern="logs/%d{yyyy-MM-dd-HH}.product.json.gz" immediateFlush="true" bufferedIO="false" bufferSize="8192" createOnDemand="true">
          <Policies>
            <TimeBasedTriggeringPolicy/>
          </Policies>
          <PatternLayout pattern="%m%n%ex"/>
        </RollingFile>
        <RollingFile name="com.databricks.LineageLogging.appender" fileName="logs/lineage.json" filePattern="logs/%d{yyyy-MM-dd-HH}.lineage.json.gz" immediateFlush="true" bufferedIO="false" bufferSize="8192" createOnDemand="true">
          <Policies>
            <TimeBasedTriggeringPolicy/>
          </Policies>
          <PatternLayout pattern="%m%n%ex"/>
        </RollingFile>
        <RollingFile name="com.databricks.MetricsLogging.appender" fileName="logs/metrics.json" filePattern="logs/%d{yyyy-MM-dd-HH}.metrics.json.gz" immediateFlush="true" bufferedIO="false" bufferSize="8192" createOnDemand="true">
          <Policies>
            <TimeBasedTriggeringPolicy/>
          </Policies>
          <PatternLayout pattern="%m%n%ex"/>
        </RollingFile>
        <RollingFile name="dltExecution.rolling" fileName="logs/dlt-execution.log" filePattern="logs/dlt-execution-%d{yyyy-MM-dd-HH}.log.gz" immediateFlush="true" bufferedIO="false" bufferSize="8192" createOnDemand="true">
          <Policies>
            <TimeBasedTriggeringPolicy/>
          </Policies>
          <PatternLayout pattern="%d{yy/MM/dd HH:mm:ss,SSS} %p %c:%L [%t] - %m%n%ex"/>
        </RollingFile>
        <Rewrite name="dltExecution.rolling.rewrite">
          <ServiceRewriteAppender/>
          <AppenderRef ref="dltExecution.rolling"/>
        </Rewrite>
      </Appenders>
    
      <Loggers>
        <Root level="INFO">
          <AppenderRef ref="publicFile.rolling.rewrite"/>
        </Root>
        <Logger name="privateLog" level="INFO" additivity="false">
          <AppenderRef ref="privateFile.rolling.rewrite"/>
        </Logger>
        <Logger name="com.databricks.UsageLogging" level="INFO" additivity="false">
          <AppenderRef ref="com.databricks.UsageLogging.appender"/>
        </Logger>
        <Logger name="com.databricks.ProductLogging" level="INFO" additivity="false">
          <AppenderRef ref="com.databricks.ProductLogging.appender"/>
        </Logger>
        <Logger name="com.databricks.LineageLogging" level="INFO" additivity="false">
          <AppenderRef ref="com.databricks.LineageLogging.appender"/>
        </Logger>
        <Logger name="com.databricks.MetricsLogging" level="INFO" additivity="false">
          <AppenderRef ref="com.databricks.MetricsLogging.appender"/>
        </Logger>
        <Logger name="com.databricks.pipelines" level="INFO" additivity="true">
          <AppenderRef ref="dltExecution.rolling.rewrite"/>
        </Logger>
        <Logger name="org.apache.spark.rdd.NewHadoopRDD" level="WARN"/>
        <Logger name="com.microsoft.azure.datalake.store" level="DEBUG"/>
        <Logger name="com.microsoft.azure.datalake.store.HttpTransport" level="DEBUG">
          <RegexFilter onMatch="DENY" onMismatch="NEUTRAL" regex=".*HTTPRequest,Succeeded.*"/>
        </Logger>
        <Logger name="com.microsoft.azure.datalake.store.HttpTransport.tokens" level="DEBUG"/>
        <!-- privacera -->
        <Logger name="com.privacera" level="INFO" additivity="false">
          <AppenderRef ref="publicFile.rolling.rewrite" />
        </Logger>
        <Logger name="org.apache.ranger" level="DEBUG" additivity="false">
          <AppenderRef ref="publicFile.rolling.rewrite" />
        </Logger>
      </Loggers>
    
    </Configuration>
    EOF
    log "enabled logs in driver successfully!!"
    
    log "enabling logs in executor...."
    LOG4J_2_EXECUTOR_FILE_PATH="/databricks/spark/dbconf/log4j/executor/log4j2.xml"    
    cat << 'EOF' >  ${LOG4J_2_EXECUTOR_FILE_PATH}
    <?xml version="1.0" encoding="UTF-8"?><Configuration status="INFO" packages="com.databricks.logging" shutdownHook="disable">
      <Appenders>
        <Console name="console" target="SYSTEM_ERR">
        <PatternLayout pattern="%d{yy/MM/dd HH:mm:ss,SSS} %p %c:%L [%t] - %m%n%ex"/>
        </Console>
        <Rewrite name="console.rewrite">
          <ServiceRewriteAppender/>
          <AppenderRef ref="console"/>
        </Rewrite>
        <RollingFile name="com.databricks.UsageLogging.appender" fileName="logs/usage.json" filePattern="logs/%d{yyyy-MM-dd-HH}.usage.json.gz" immediateFlush="true" bufferedIO="false" bufferSize="8192" createOnDemand="true">
          <Policies>
            <TimeBasedTriggeringPolicy/>
          </Policies>
          <PatternLayout pattern="%m%n%ex"/>
        </RollingFile>
    
        <RollingFile name="publicFile.rolling" fileName="logs/log4j-active.log" filePattern="logs/log4j-%d{yyyy-MM-dd-HH}.log.gz" immediateFlush="true" bufferedIO="false" bufferSize="8192" createOnDemand="true">
          <Policies>
            <TimeBasedTriggeringPolicy/>
          </Policies>
          <PatternLayout pattern="%d{yy/MM/dd HH:mm:ss,SSS} %p %c:%L [%t] - %m%n%ex"/>
        </RollingFile>
        <Rewrite name="publicFile.rolling.rewrite">
          <ServiceRewriteAppender/>
          <AppenderRef ref="publicFile.rolling"/>
        </Rewrite>
      </Appenders>
    
      <Loggers>
        <Logger name="org.apache.spark.rdd.NewHadoopRDD" level="WARN"/>
        <Logger name="com.microsoft.azure.datalake.store" level="DEBUG"/>
        <Logger name="com.microsoft.azure.datalake.store.HttpTransport" level="DEBUG">
          <RegexFilter onMatch="DENY" onMismatch="NEUTRAL" regex=".*HTTPRequest,Succeeded.*"/>
        </Logger>
        <Logger name="com.microsoft.azure.datalake.store.HttpTransport.tokens" level="DEBUG"/>
        <Logger name="com.databricks.UsageLogging" level="INFO" additivity="false">
          <AppenderRef ref="com.databricks.UsageLogging.appender"/>
        </Logger>
        <Root level="INFO">
          <AppenderRef ref="console.rewrite"/>
          <AppenderRef ref="publicFile.rolling.rewrite"/>
        </Root>
    
        <!-- privacera -->
        <Logger name="com.privacera" level="INFO" additivity="false">
          <AppenderRef ref="console.rewrite" />
          <AppenderRef ref="publicFile.rolling.rewrite"/>
        </Logger>
        <Logger name="org.apache.ranger" level="INFO" additivity="false">
          <AppenderRef ref="console.rewrite" />
          <AppenderRef ref="publicFile.rolling.rewrite"/>
        </Logger>
      </Loggers>
    
    </Configuration>
    EOF
    log "enabled logs in executor successfully!!"
    log "======================debug.sh execution completed!!!======================"
    
    log "Copying privacera.out to ${PRIVACERA_CLUSTER_LOGS_DIR}"
    cp ${PRIVACERA_OUT_FILE} ${PRIVACERA_CLUSTER_LOGS_DIR}/privacera.out
    
  6. Save debug.sh file.

  7. Now click on the Compute icon on the sidebar, then click on the cluster where logs need to be enabled and Edit this cluster.
  8. Scroll down to Advanced option and click on Init Scripts.
    • Select Workspace as the source for init scripts.
    • Specify the debug.sh script path that created in step 5, after plugin scripts (ranger_enable.sh for FGAC and ranger_enable_scala.sh for OLAC), which must be added as first script.
  9. Click the Add button.
  10. Now in Logging section, define path to store logs

    • Select DBFS as Destination for logs folder.
    • in Log Path add the path, e.g. dbfs:/logs/<cluster-name>/<folder>
  11. Click Confirm to save changes.

  12. Click on Start to restart the cluster.
  13. To copy logs on local machine, run below command:
    Bash
    dbfs cp -r dbfs:/logs/<cluster-name>/<folder> <local-folder>
    
  14. Once the cluster is started, Check privacera.out logs for validation of debug.sh script execution. Refer here for more details.

Steps to deploy custom build in Databricks cluster

Here are the steps to deploy custom build in databricks cluster:

  1. Prerequisites:
    • A running Databricks cluster.
  2. Log in to Databricks Web UI
  3. Click on the Workspace icon on the sidebar
  4. Navigate to folder where you want to create the custom build script.
  5. Click on Create icon on top right side, then click on File, set file name as custom_build.sh and add below content to the file:

    Script to deploy custom build in Databricks cluster
    custom_build.sh
    #!/bin/bash
    
    PRIVACERA_OUT_FILE=/root/privacera/privacera.out
    PRIVACERA_CLUSTER_LOGS_DIR=${PRIVACERA_CLUSTER_LOGS_DIR:-/dbfs/privacera/cluster-logs/${DB_CLUSTER_NAME}}
    
    LOG_LEVEL=INFO
    function log(){
      msg=$1
      currentTime=`date`
      echo "${currentTime} : ${msg} " >> ${PRIVACERA_OUT_FILE}
    }
    
    log "======================custom_build.sh execution started!!!======================"
    if [[ -z "${CUSTOM_BUILD_PKG}" ]]; then
      log "Error: CUSTOM_BUILD_PKG is not set or is empty. Please provide a valid URL in Environments Variables."
      exit 1
    fi
    
    log "Downloading custom Privacera Spark Plugin from ${CUSTOM_BUILD_PKG}..."
    
    log "Creating temporary directory for the custom build..."
    mkdir -p /tmp/custom
    cd /tmp/custom
    
    wget ${CUSTOM_BUILD_PKG} -O privacera-spark-plugin.tar.gz
    if [[ $? -ne 0 ]]; then
      log "Error: Failed to download the package from ${CUSTOM_BUILD_PKG}. Please check the URL."
      exit 1
    fi
    
    tar -xzf privacera-spark-plugin.tar.gz
    if [[ $? -ne 0 ]]; then
      log "Error: Failed to extract privacera-spark-plugin.tar.gz."
      exit 1
    fi
    
    OLD_MD5_SUM=$(md5sum /databricks/jars/privacera-agent.jar | awk '{print $1}')
    log "md5 checksum of the existing privacera-agent.jar: ${OLD_MD5_SUM}"
    log "Removing existing Privacera and Ranger jars..."
    rm -r /databricks/jars/ranger-* /databricks/jars/privacera-agent.jar
    
    log "Copying new jars to /databricks/jars/..."
    cp -r spark-plugin/* /databricks/jars/
    
    MD5_SUM=$(md5sum /databricks/jars/privacera-agent.jar | awk '{print $1}')
    log "md5 checksum of the privacera-agent.jar: ${MD5_SUM}" 
    log "Deployed custom build successfully."
    log "======================custom_build.sh execution completed!!!======================"
    
    log "Copying privacera.out to ${PRIVACERA_CLUSTER_LOGS_DIR}"
    cp ${PRIVACERA_OUT_FILE} ${PRIVACERA_CLUSTER_LOGS_DIR}/privacera.out
    
  6. Save custom_build.sh file.

  7. Now click on the Compute icon on the sidebar, then click on the cluster where logs need to be enabled and Edit this cluster.
  8. Scroll down to Advanced option and click on Init Scripts.
    • Select Workspace as the source for init scripts.
    • Specify the custom_build.sh script path that created in step 5, after plugin scripts (ranger_enable.sh for FGAC and ranger_enable_scala.sh for OLAC), which must be added as first script.
  9. Click the Add button.
  10. Now navigate to Spark tab, in Environment variables section, add below environment variable:
    Bash
    CUSTOM_BUILD_PKG=<Custom package URL provided by privacera>
    
  11. Click Confirm to save changes.
  12. Click on Start to restart the cluster.
  13. Once the cluster is started, Check privacera.out logs for validation of custom build deployment. Refer here for more details.

Comments