Troubleshooting for Access Management for Databricks all-purpose compute clusters with Fine-Grained Access Control (FGAC)¶
Steps to get installation logs and version file¶
Here are the steps to get Installation logs and Privacera plugin version files from DBFS:
- Privacera init script generates two files in DBFS at the location
dbfs:/privacera/cluster-logs/<CLUSTER_NAME>/
-
Commands to list files from dbfs location
Bash - Folder will have two files
- privacera.out : Installation log
- privacera_version.txt : Privacera plugin version details.
- Folder will have two files
-
Command to get files on local:
Bash
Steps to enable logs for Databricks cluster¶
Here are the steps to enable logs for com.privacera
package:
- Prerequisites:
- A running Databricks cluster.
- Log in to Databricks Web UI
- Click on the Workspace icon on the sidebar
- Navigate to folder where you want to create the debug script.
-
Click on Create icon on top right side, then click on File, set file name as
debug.sh
and add below content to the file:Enable logs in Databricks Runtime Version 10.4 LTS
Note: LOG_LEVEL can be set to
INFO
/DEBUG
/TRACE
as per usage.Enable debug logs in Databricks Runtime Version 11.3 LTS and above
Note: In field
<Logger name="com.privacera" level="<log-level>" additivity="false">
log level can be set toINFO
/DEBUG
/TRACE
as per usage.debug.sh 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178
#!/bin/bash PRIVACERA_OUT_FILE=/root/privacera/privacera.out PRIVACERA_CLUSTER_LOGS_DIR=${PRIVACERA_CLUSTER_LOGS_DIR:-/dbfs/privacera/cluster-logs/${DB_CLUSTER_NAME}} function log(){ msg=$1 currentTime=`date` echo "${currentTime} : ${msg} " >> ${PRIVACERA_OUT_FILE} } log "======================debug.sh execution started!!!======================" log "enabling logs in driver...." LOG4J_2_DRIVER_FILE_PATH="/databricks/spark/dbconf/log4j/driver/log4j2.xml" cat << 'EOF' > ${LOG4J_2_DRIVER_FILE_PATH} <?xml version="1.0" encoding="UTF-8"?><Configuration status="INFO" packages="com.databricks.logging" shutdownHook="disable"> <Appenders> <RollingFile name="publicFile.rolling" fileName="logs/log4j-active.log" filePattern="logs/log4j-%d{yyyy-MM-dd-HH}.log.gz" immediateFlush="true" bufferedIO="false" bufferSize="8192" createOnDemand="true"> <Policies> <TimeBasedTriggeringPolicy/> </Policies> <PatternLayout pattern="%d{yy/MM/dd HH:mm:ss,SSS} %p %c:%L [%t] - %m%n%ex"/> </RollingFile> <Rewrite name="publicFile.rolling.rewrite"> <ServiceRewriteAppender/> <AppenderRef ref="publicFile.rolling"/> </Rewrite> <RollingFile name="privateFile.rolling" fileName="logs/active.log" filePattern="logs/%d{yyyy-MM-dd-HH}.log.gz" immediateFlush="true" bufferedIO="false" bufferSize="8192" createOnDemand="true"> <Policies> <TimeBasedTriggeringPolicy/> </Policies> <PatternLayout pattern="%d{yy/MM/dd HH:mm:ss,SSS} %p %c{1}: %m%n%ex"/> </RollingFile> <Rewrite name="privateFile.rolling.rewrite"> <ServiceRewriteAppender/> <AppenderRef ref="privateFile.rolling"/> </Rewrite> <RollingFile name="com.databricks.UsageLogging.appender" fileName="logs/usage.json" filePattern="logs/%d{yyyy-MM-dd-HH}.usage.json.gz" immediateFlush="true" bufferedIO="false" bufferSize="8192" createOnDemand="true"> <Policies> <TimeBasedTriggeringPolicy/> </Policies> <PatternLayout pattern="%m%n%ex"/> </RollingFile> <RollingFile name="com.databricks.ProductLogging.appender" fileName="logs/product.json" filePattern="logs/%d{yyyy-MM-dd-HH}.product.json.gz" immediateFlush="true" bufferedIO="false" bufferSize="8192" createOnDemand="true"> <Policies> <TimeBasedTriggeringPolicy/> </Policies> <PatternLayout pattern="%m%n%ex"/> </RollingFile> <RollingFile name="com.databricks.LineageLogging.appender" fileName="logs/lineage.json" filePattern="logs/%d{yyyy-MM-dd-HH}.lineage.json.gz" immediateFlush="true" bufferedIO="false" bufferSize="8192" createOnDemand="true"> <Policies> <TimeBasedTriggeringPolicy/> </Policies> <PatternLayout pattern="%m%n%ex"/> </RollingFile> <RollingFile name="com.databricks.MetricsLogging.appender" fileName="logs/metrics.json" filePattern="logs/%d{yyyy-MM-dd-HH}.metrics.json.gz" immediateFlush="true" bufferedIO="false" bufferSize="8192" createOnDemand="true"> <Policies> <TimeBasedTriggeringPolicy/> </Policies> <PatternLayout pattern="%m%n%ex"/> </RollingFile> <RollingFile name="dltExecution.rolling" fileName="logs/dlt-execution.log" filePattern="logs/dlt-execution-%d{yyyy-MM-dd-HH}.log.gz" immediateFlush="true" bufferedIO="false" bufferSize="8192" createOnDemand="true"> <Policies> <TimeBasedTriggeringPolicy/> </Policies> <PatternLayout pattern="%d{yy/MM/dd HH:mm:ss,SSS} %p %c:%L [%t] - %m%n%ex"/> </RollingFile> <Rewrite name="dltExecution.rolling.rewrite"> <ServiceRewriteAppender/> <AppenderRef ref="dltExecution.rolling"/> </Rewrite> </Appenders> <Loggers> <Root level="INFO"> <AppenderRef ref="publicFile.rolling.rewrite"/> </Root> <Logger name="privateLog" level="INFO" additivity="false"> <AppenderRef ref="privateFile.rolling.rewrite"/> </Logger> <Logger name="com.databricks.UsageLogging" level="INFO" additivity="false"> <AppenderRef ref="com.databricks.UsageLogging.appender"/> </Logger> <Logger name="com.databricks.ProductLogging" level="INFO" additivity="false"> <AppenderRef ref="com.databricks.ProductLogging.appender"/> </Logger> <Logger name="com.databricks.LineageLogging" level="INFO" additivity="false"> <AppenderRef ref="com.databricks.LineageLogging.appender"/> </Logger> <Logger name="com.databricks.MetricsLogging" level="INFO" additivity="false"> <AppenderRef ref="com.databricks.MetricsLogging.appender"/> </Logger> <Logger name="com.databricks.pipelines" level="INFO" additivity="true"> <AppenderRef ref="dltExecution.rolling.rewrite"/> </Logger> <Logger name="org.apache.spark.rdd.NewHadoopRDD" level="WARN"/> <Logger name="com.microsoft.azure.datalake.store" level="DEBUG"/> <Logger name="com.microsoft.azure.datalake.store.HttpTransport" level="DEBUG"> <RegexFilter onMatch="DENY" onMismatch="NEUTRAL" regex=".*HTTPRequest,Succeeded.*"/> </Logger> <Logger name="com.microsoft.azure.datalake.store.HttpTransport.tokens" level="DEBUG"/> <!-- privacera --> <Logger name="com.privacera" level="INFO" additivity="false"> <AppenderRef ref="publicFile.rolling.rewrite" /> </Logger> <Logger name="org.apache.ranger" level="DEBUG" additivity="false"> <AppenderRef ref="publicFile.rolling.rewrite" /> </Logger> </Loggers> </Configuration> EOF log "enabled logs in driver successfully!!" log "enabling logs in executor...." LOG4J_2_EXECUTOR_FILE_PATH="/databricks/spark/dbconf/log4j/executor/log4j2.xml" cat << 'EOF' > ${LOG4J_2_EXECUTOR_FILE_PATH} <?xml version="1.0" encoding="UTF-8"?><Configuration status="INFO" packages="com.databricks.logging" shutdownHook="disable"> <Appenders> <Console name="console" target="SYSTEM_ERR"> <PatternLayout pattern="%d{yy/MM/dd HH:mm:ss,SSS} %p %c:%L [%t] - %m%n%ex"/> </Console> <Rewrite name="console.rewrite"> <ServiceRewriteAppender/> <AppenderRef ref="console"/> </Rewrite> <RollingFile name="com.databricks.UsageLogging.appender" fileName="logs/usage.json" filePattern="logs/%d{yyyy-MM-dd-HH}.usage.json.gz" immediateFlush="true" bufferedIO="false" bufferSize="8192" createOnDemand="true"> <Policies> <TimeBasedTriggeringPolicy/> </Policies> <PatternLayout pattern="%m%n%ex"/> </RollingFile> <RollingFile name="publicFile.rolling" fileName="logs/log4j-active.log" filePattern="logs/log4j-%d{yyyy-MM-dd-HH}.log.gz" immediateFlush="true" bufferedIO="false" bufferSize="8192" createOnDemand="true"> <Policies> <TimeBasedTriggeringPolicy/> </Policies> <PatternLayout pattern="%d{yy/MM/dd HH:mm:ss,SSS} %p %c:%L [%t] - %m%n%ex"/> </RollingFile> <Rewrite name="publicFile.rolling.rewrite"> <ServiceRewriteAppender/> <AppenderRef ref="publicFile.rolling"/> </Rewrite> </Appenders> <Loggers> <Logger name="org.apache.spark.rdd.NewHadoopRDD" level="WARN"/> <Logger name="com.microsoft.azure.datalake.store" level="DEBUG"/> <Logger name="com.microsoft.azure.datalake.store.HttpTransport" level="DEBUG"> <RegexFilter onMatch="DENY" onMismatch="NEUTRAL" regex=".*HTTPRequest,Succeeded.*"/> </Logger> <Logger name="com.microsoft.azure.datalake.store.HttpTransport.tokens" level="DEBUG"/> <Logger name="com.databricks.UsageLogging" level="INFO" additivity="false"> <AppenderRef ref="com.databricks.UsageLogging.appender"/> </Logger> <Root level="INFO"> <AppenderRef ref="console.rewrite"/> <AppenderRef ref="publicFile.rolling.rewrite"/> </Root> <!-- privacera --> <Logger name="com.privacera" level="INFO" additivity="false"> <AppenderRef ref="console.rewrite" /> <AppenderRef ref="publicFile.rolling.rewrite"/> </Logger> <Logger name="org.apache.ranger" level="INFO" additivity="false"> <AppenderRef ref="console.rewrite" /> <AppenderRef ref="publicFile.rolling.rewrite"/> </Logger> </Loggers> </Configuration> EOF log "enabled logs in executor successfully!!" log "======================debug.sh execution completed!!!======================" log "Copying privacera.out to ${PRIVACERA_CLUSTER_LOGS_DIR}" cp ${PRIVACERA_OUT_FILE} ${PRIVACERA_CLUSTER_LOGS_DIR}/privacera.out
-
Save
debug.sh
file. - Now click on the Compute icon on the sidebar, then click on the cluster where logs need to be enabled and Edit this cluster.
- Scroll down to Advanced option and click on Init Scripts.
- Select Workspace as the source for init scripts.
- Specify the
debug.sh
script path that created instep 5
, after plugin scripts (ranger_enable.sh
forFGAC
andranger_enable_scala.sh
forOLAC
), which must be added as first script.
- Click the Add button.
-
Now in Logging section, define path to store logs
- Select DBFS as Destination for logs folder.
- in Log Path add the path, e.g.
dbfs:/logs/<cluster-name>/<folder>
-
Click Confirm to save changes.
- Click on Start to restart the cluster.
- To copy logs on local machine, run below command:
Bash - Once the cluster is started, Check
privacera.out
logs for validation ofdebug.sh
script execution. Refer here for more details.
Steps to deploy custom build in Databricks cluster¶
Here are the steps to deploy custom build in databricks cluster:
- Prerequisites:
- A running Databricks cluster.
- Log in to Databricks Web UI
- Click on the Workspace icon on the sidebar
- Navigate to folder where you want to create the custom build script.
-
Click on Create icon on top right side, then click on File, set file name as
custom_build.sh
and add below content to the file:Script to deploy custom build in Databricks cluster
-
Save
custom_build.sh
file. - Now click on the Compute icon on the sidebar, then click on the cluster where logs need to be enabled and Edit this cluster.
- Scroll down to Advanced option and click on Init Scripts.
- Select Workspace as the source for init scripts.
- Specify the
custom_build.sh
script path that created instep 5
, after plugin scripts (ranger_enable.sh
forFGAC
andranger_enable_scala.sh
forOLAC
), which must be added as first script.
- Click the Add button.
- Now navigate to Spark tab, in Environment variables section, add below environment variable:
Bash - Click Confirm to save changes.
- Click on Start to restart the cluster.
- Once the cluster is started, Check
privacera.out
logs for validation of custom build deployment. Refer here for more details.
- Prev topic: Advanced Configuration