Troubleshooting for Access Management for Databricks all-purpose compute clusters with Fine-Grained Access Control (FGAC)¶
Steps to get installation logs and version file¶
Here are the steps to get Installation logs and Privacera plugin version files from DBFS:
- Privacera init script generates two files in DBFS at the location
dbfs:/privacera/cluster-logs/<CLUSTER_NAME>/
-
Commands to list files from dbfs location
Bash - Folder will have two files
- privacera.out : Installation log
- privacera_version.txt : Privacera plugin version details.
- Folder will have two files
-
Command to get files on local:
Bash
Steps to enable logs for Databricks cluster¶
Perform the following steps to enable debug logs for com.privacera
package:
- Prerequisites:
- A running Databricks cluster.
- Log in to Databricks Web UI
- Click on the Workspace icon on the sidebar
- Navigate to folder where you want to create the debug script.
-
Click on Create icon on top right side, then click on File, set file name as
debug.sh
and add below content to the file:Enable debug logs in Databricks Runtime Version 10.4 LTS
Note:
ROOT_LOG_LEVEL
,PRIVACERA_LOG_LEVEL
,RANGER_LOG_LEVEL
can be set toINFO
/DEBUG
/TRACE
as per usage.Enable debug logs in Databricks Runtime Version 11.3 LTS and above
Note:
ROOT_LOG_LEVEL
,PRIVACERA_LOG_LEVEL
,RANGER_LOG_LEVEL
can be set toINFO
/DEBUG
/TRACE
as per usage.debug.sh 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181
#!/bin/bash PRIVACERA_OUT_FILE=/root/privacera/privacera.out PRIVACERA_CLUSTER_LOGS_DIR=${PRIVACERA_CLUSTER_LOGS_DIR:-/dbfs/privacera/cluster-logs/${DB_CLUSTER_NAME}} ROOT_LOG_LEVEL="INFO" PRIVACERA_LOG_LEVEL="DEBUG" RANGER_LOG_LEVEL="INFO" function log(){ msg=$1 currentTime=`date` echo "${currentTime} : ${msg} " >> ${PRIVACERA_OUT_FILE} } log "======================debug.sh execution started!!!======================" log "enabling logs in driver...." LOG4J_2_DRIVER_FILE_PATH="/databricks/spark/dbconf/log4j/driver/log4j2.xml" cat << EOF > ${LOG4J_2_DRIVER_FILE_PATH} <?xml version="1.0" encoding="UTF-8"?><Configuration status="INFO" packages="com.databricks.logging" shutdownHook="disable"> <Appenders> <RollingFile name="publicFile.rolling" fileName="logs/log4j-active.log" filePattern="logs/log4j-%d{yyyy-MM-dd-HH}.log.gz" immediateFlush="true" bufferedIO="false" bufferSize="8192" createOnDemand="true"> <Policies> <TimeBasedTriggeringPolicy/> </Policies> <PatternLayout pattern="%d{yy/MM/dd HH:mm:ss,SSS} %p %c:%L [%t] - %m%n%ex"/> </RollingFile> <Rewrite name="publicFile.rolling.rewrite"> <ServiceRewriteAppender/> <AppenderRef ref="publicFile.rolling"/> </Rewrite> <RollingFile name="privateFile.rolling" fileName="logs/active.log" filePattern="logs/%d{yyyy-MM-dd-HH}.log.gz" immediateFlush="true" bufferedIO="false" bufferSize="8192" createOnDemand="true"> <Policies> <TimeBasedTriggeringPolicy/> </Policies> <PatternLayout pattern="%d{yy/MM/dd HH:mm:ss,SSS} %p %c{1}: %m%n%ex"/> </RollingFile> <Rewrite name="privateFile.rolling.rewrite"> <ServiceRewriteAppender/> <AppenderRef ref="privateFile.rolling"/> </Rewrite> <RollingFile name="com.databricks.UsageLogging.appender" fileName="logs/usage.json" filePattern="logs/%d{yyyy-MM-dd-HH}.usage.json.gz" immediateFlush="true" bufferedIO="false" bufferSize="8192" createOnDemand="true"> <Policies> <TimeBasedTriggeringPolicy/> </Policies> <PatternLayout pattern="%m%n%ex"/> </RollingFile> <RollingFile name="com.databricks.ProductLogging.appender" fileName="logs/product.json" filePattern="logs/%d{yyyy-MM-dd-HH}.product.json.gz" immediateFlush="true" bufferedIO="false" bufferSize="8192" createOnDemand="true"> <Policies> <TimeBasedTriggeringPolicy/> </Policies> <PatternLayout pattern="%m%n%ex"/> </RollingFile> <RollingFile name="com.databricks.LineageLogging.appender" fileName="logs/lineage.json" filePattern="logs/%d{yyyy-MM-dd-HH}.lineage.json.gz" immediateFlush="true" bufferedIO="false" bufferSize="8192" createOnDemand="true"> <Policies> <TimeBasedTriggeringPolicy/> </Policies> <PatternLayout pattern="%m%n%ex"/> </RollingFile> <RollingFile name="com.databricks.MetricsLogging.appender" fileName="logs/metrics.json" filePattern="logs/%d{yyyy-MM-dd-HH}.metrics.json.gz" immediateFlush="true" bufferedIO="false" bufferSize="8192" createOnDemand="true"> <Policies> <TimeBasedTriggeringPolicy/> </Policies> <PatternLayout pattern="%m%n%ex"/> </RollingFile> <RollingFile name="dltExecution.rolling" fileName="logs/dlt-execution.log" filePattern="logs/dlt-execution-%d{yyyy-MM-dd-HH}.log.gz" immediateFlush="true" bufferedIO="false" bufferSize="8192" createOnDemand="true"> <Policies> <TimeBasedTriggeringPolicy/> </Policies> <PatternLayout pattern="%d{yy/MM/dd HH:mm:ss,SSS} %p %c:%L [%t] - %m%n%ex"/> </RollingFile> <Rewrite name="dltExecution.rolling.rewrite"> <ServiceRewriteAppender/> <AppenderRef ref="dltExecution.rolling"/> </Rewrite> </Appenders> <Loggers> <Root level="${ROOT_LOG_LEVEL}"> <AppenderRef ref="publicFile.rolling.rewrite"/> </Root> <Logger name="privateLog" level="INFO" additivity="false"> <AppenderRef ref="privateFile.rolling.rewrite"/> </Logger> <Logger name="com.databricks.UsageLogging" level="INFO" additivity="false"> <AppenderRef ref="com.databricks.UsageLogging.appender"/> </Logger> <Logger name="com.databricks.ProductLogging" level="INFO" additivity="false"> <AppenderRef ref="com.databricks.ProductLogging.appender"/> </Logger> <Logger name="com.databricks.LineageLogging" level="INFO" additivity="false"> <AppenderRef ref="com.databricks.LineageLogging.appender"/> </Logger> <Logger name="com.databricks.MetricsLogging" level="INFO" additivity="false"> <AppenderRef ref="com.databricks.MetricsLogging.appender"/> </Logger> <Logger name="com.databricks.pipelines" level="INFO" additivity="true"> <AppenderRef ref="dltExecution.rolling.rewrite"/> </Logger> <Logger name="org.apache.spark.rdd.NewHadoopRDD" level="WARN"/> <Logger name="com.microsoft.azure.datalake.store" level="DEBUG"/> <Logger name="com.microsoft.azure.datalake.store.HttpTransport" level="DEBUG"> <RegexFilter onMatch="DENY" onMismatch="NEUTRAL" regex=".*HTTPRequest,Succeeded.*"/> </Logger> <Logger name="com.microsoft.azure.datalake.store.HttpTransport.tokens" level="DEBUG"/> <!-- privacera --> <Logger name="com.privacera" level="${PRIVACERA_LOG_LEVEL}" additivity="false"> <AppenderRef ref="publicFile.rolling.rewrite" /> </Logger> <Logger name="org.apache.ranger" level="${RANGER_LOG_LEVEL}" additivity="false"> <AppenderRef ref="publicFile.rolling.rewrite" /> </Logger> </Loggers> </Configuration> EOF log "enabled logs in driver successfully!!" log "enabling logs in executor...." LOG4J_2_EXECUTOR_FILE_PATH="/databricks/spark/dbconf/log4j/executor/log4j2.xml" cat << EOF > ${LOG4J_2_EXECUTOR_FILE_PATH} <?xml version="1.0" encoding="UTF-8"?><Configuration status="INFO" packages="com.databricks.logging" shutdownHook="disable"> <Appenders> <Console name="console" target="SYSTEM_ERR"> <PatternLayout pattern="%d{yy/MM/dd HH:mm:ss,SSS} %p %c:%L [%t] - %m%n%ex"/> </Console> <Rewrite name="console.rewrite"> <ServiceRewriteAppender/> <AppenderRef ref="console"/> </Rewrite> <RollingFile name="com.databricks.UsageLogging.appender" fileName="logs/usage.json" filePattern="logs/%d{yyyy-MM-dd-HH}.usage.json.gz" immediateFlush="true" bufferedIO="false" bufferSize="8192" createOnDemand="true"> <Policies> <TimeBasedTriggeringPolicy/> </Policies> <PatternLayout pattern="%m%n%ex"/> </RollingFile> <RollingFile name="publicFile.rolling" fileName="logs/log4j-active.log" filePattern="logs/log4j-%d{yyyy-MM-dd-HH}.log.gz" immediateFlush="true" bufferedIO="false" bufferSize="8192" createOnDemand="true"> <Policies> <TimeBasedTriggeringPolicy/> </Policies> <PatternLayout pattern="%d{yy/MM/dd HH:mm:ss,SSS} %p %c:%L [%t] - %m%n%ex"/> </RollingFile> <Rewrite name="publicFile.rolling.rewrite"> <ServiceRewriteAppender/> <AppenderRef ref="publicFile.rolling"/> </Rewrite> </Appenders> <Loggers> <Logger name="org.apache.spark.rdd.NewHadoopRDD" level="WARN"/> <Logger name="com.microsoft.azure.datalake.store" level="DEBUG"/> <Logger name="com.microsoft.azure.datalake.store.HttpTransport" level="DEBUG"> <RegexFilter onMatch="DENY" onMismatch="NEUTRAL" regex=".*HTTPRequest,Succeeded.*"/> </Logger> <Logger name="com.microsoft.azure.datalake.store.HttpTransport.tokens" level="DEBUG"/> <Logger name="com.databricks.UsageLogging" level="INFO" additivity="false"> <AppenderRef ref="com.databricks.UsageLogging.appender"/> </Logger> <Root level="${ROOT_LOG_LEVEL}"> <AppenderRef ref="console.rewrite"/> <AppenderRef ref="publicFile.rolling.rewrite"/> </Root> <!-- privacera --> <Logger name="com.privacera" level="${PRIVACERA_LOG_LEVEL}" additivity="false"> <AppenderRef ref="console.rewrite" /> <AppenderRef ref="publicFile.rolling.rewrite"/> </Logger> <Logger name="org.apache.ranger" level="${RANGER_LOG_LEVEL}" additivity="false"> <AppenderRef ref="console.rewrite" /> <AppenderRef ref="publicFile.rolling.rewrite"/> </Logger> </Loggers> </Configuration> EOF log "enabled logs in executor successfully!!" log "======================debug.sh execution completed!!!======================" log "Copying privacera.out to ${PRIVACERA_CLUSTER_LOGS_DIR}" cp ${PRIVACERA_OUT_FILE} ${PRIVACERA_CLUSTER_LOGS_DIR}/privacera.out
-
Save
debug.sh
file. - Now click on the Compute icon on the sidebar, then click on the cluster where logs need to be enabled and Edit this cluster.
- Scroll down to Advanced option and click on Init Scripts.
- Select Workspace as the source for init scripts.
- Specify the
debug.sh
script path that created instep 5
, after plugin scripts (ranger_enable.sh
forFGAC
andranger_enable_scala.sh
forOLAC
), which must be added as first script.
- Click the Add button.
-
Now in Logging section, define path to store logs
- Select DBFS as Destination for logs folder.
- in Log Path add the path, e.g.
dbfs:/logs/<cluster-name>/<folder>
-
Click Confirm to save changes.
- Click on Start to restart the cluster.
- To copy logs on local machine, run below command:
Bash - Once the cluster is started, Check
privacera.out
logs for validation ofdebug.sh
script execution. Refer here for more details.
Steps to deploy custom build in Databricks cluster¶
Here are the steps to deploy custom build in databricks cluster:
- Prerequisites:
- A running Databricks cluster.
- Log in to Databricks Web UI
- Click on the Workspace icon on the sidebar
- Navigate to folder where you want to create the custom build script.
-
Click on Create icon on top right side, then click on File, set file name as
custom_build.sh
and add below content to the file:Script to deploy custom build in Databricks cluster
-
Save
custom_build.sh
file. - Now click on the Compute icon on the sidebar, then click on the cluster where logs need to be enabled and Edit this cluster.
- Scroll down to Advanced option and click on Init Scripts.
- Select Workspace as the source for init scripts.
- Specify the
custom_build.sh
script path that created instep 5
, after plugin scripts (ranger_enable.sh
forFGAC
andranger_enable_scala.sh
forOLAC
), which must be added as first script.
- Click the Add button.
- Now navigate to Spark tab, in Environment variables section, add below environment variable:
Bash - Click Confirm to save changes.
- Click on Start to restart the cluster.
- Once the cluster is started, Check
privacera.out
logs for validation of custom build deployment. Refer here for more details.
Steps to Skip the Access Check for the Databricks Cluster Libraries Path¶
When attaching a library to a Databricks cluster, the plugin performs an access check on the library's path. If the required permissions are missing, the library installation fails.
To skip this access check, follow the steps below:
- Go to the Databricks cluster where you want to attach the libraries.
- Click the Edit button in cluster.
- Under the Advanced options navigate to the Spark tab.
-
In the Spark config section, add the following Spark property to skip the access check for the Databricks library path:
Bash - If this property is already configured with other paths, append
dbfs:/local_disk*
to the existing list using a comma separator. For example:Bash
- If this property is already configured with other paths, append
-
Click the Confirm button to save the changes.
- Navigate to the Libraries tab to verify if libraries are already attached to the cluster.
- Click Install New to add a new library.
- Choose the appropriate Library Source and select the desired library.
- Click Install to install the library.
- Click Restart to restart the cluster
- With the Spark property in place, the plugin will skip access checks on the specified cluster library path.
- Once the cluster restarts, check the Libraries tab. The library status should display as Installed.
Handling LogicalPlan Filter Conflicts in Spark FGAC Plugin¶
Warning
Disabling this property is not recommended when using Privacera Row Level Filter
policies.
In Spark FGAC (Fine-Grained Access Control) use cases, the plugin transforms the LogicalPlan
in multiple phases using custom Scala rules. These transformations are implemented to support scenarios such as view-based access control, row level filtering (RLF) and column masking.
However, in certain cases, the incoming LogicalPlan
may contain multiple filter conditions. This can lead to incorrect evaluation by the plugin, potentially causing query failures due to runtime exceptions.
For instance, the plugin might throw a ClassCastException
if it encounters a filter condition that is incompatible with the expected data type. Such issues typically arise when a filter is applied to a column that has been transformed or implicitly cast to a different type.
To troubleshoot or mitigate this issue, you can disable the plugin's modification of the LogicalPlan
. This can be done by following below steps:
- Go to the Databricks cluster where you want to attach the libraries.
- Click the Edit button in cluster.
- Under the Advanced options navigate to the Spark tab.
- In the Spark config section, add the following Spark property to skip the plugin’s transformation logic for Filter conditions.
Bash - Click the Confirm button to save the changes.
- Click Restart to restart the cluster.
Tip
If the Privacera Row Level Filter
policy is enabled and the issue persists, enable the Enhanced Extension. For configuration steps, refer to the advanced configurations here
- Prev topic: Advanced Configuration