Steps to deploy custom build in Databricks cluster
Here are the steps to deploy custom build in databricks cluster:
- Prerequisites:
- A running Databricks cluster.
- Log in to Databricks Web UI
- Click on the Workspace icon on the sidebar
- Navigate to folder where you want to create the custom build script.
-
Click on Create icon on top right side, then click on File, set file name as custom_build.sh
and add below content to the file:
Script to deploy custom build in Databricks cluster
custom_build.sh |
---|
| #!/bin/bash
PRIVACERA_OUT_FILE=/root/privacera/privacera.out
PRIVACERA_CLUSTER_LOGS_DIR=${PRIVACERA_CLUSTER_LOGS_DIR:-/dbfs/privacera/cluster-logs/${DB_CLUSTER_NAME}}
LOG_LEVEL=INFO
function log(){
msg=$1
currentTime=`date`
echo "${currentTime} : ${msg} " >> ${PRIVACERA_OUT_FILE}
}
log "======================custom_build.sh execution started!!!======================"
if [[ -z "${CUSTOM_BUILD_PKG}" ]]; then
log "Error: CUSTOM_BUILD_PKG is not set or is empty. Please provide a valid URL in Environments Variables."
exit 1
fi
log "Downloading custom Privacera Spark Plugin from ${CUSTOM_BUILD_PKG}..."
log "Creating temporary directory for the custom build..."
mkdir -p /tmp/custom
cd /tmp/custom
wget ${CUSTOM_BUILD_PKG} -O privacera-spark-plugin.tar.gz
if [[ $? -ne 0 ]]; then
log "Error: Failed to download the package from ${CUSTOM_BUILD_PKG}. Please check the URL."
exit 1
fi
tar -xzf privacera-spark-plugin.tar.gz
if [[ $? -ne 0 ]]; then
log "Error: Failed to extract privacera-spark-plugin.tar.gz."
exit 1
fi
OLD_MD5_SUM=$(md5sum /databricks/jars/privacera-agent.jar | awk '{print $1}')
log "md5 checksum of the existing privacera-agent.jar: ${OLD_MD5_SUM}"
log "Removing existing Privacera and Ranger jars..."
rm -r /databricks/jars/ranger-* /databricks/jars/privacera-agent.jar
log "Copying new jars to /databricks/jars/..."
cp -r spark-plugin/* /databricks/jars/
MD5_SUM=$(md5sum /databricks/jars/privacera-agent.jar | awk '{print $1}')
log "md5 checksum of the privacera-agent.jar: ${MD5_SUM}"
log "Deployed custom build successfully."
log "======================custom_build.sh execution completed!!!======================"
log "Copying privacera.out to ${PRIVACERA_CLUSTER_LOGS_DIR}"
cp ${PRIVACERA_OUT_FILE} ${PRIVACERA_CLUSTER_LOGS_DIR}/privacera.out
|
-
Save custom_build.sh
file.
- Now click on the Compute icon on the sidebar, then click on the cluster where logs need to be enabled and Edit this cluster.
- Scroll down to Advanced option and click on Init Scripts.
- Select Workspace as the source for init scripts.
- Specify the
custom_build.sh
script path that created in step 5
, after plugin scripts (ranger_enable.sh
for FGAC
and ranger_enable_scala.sh
for OLAC
), which must be added as first script.
- Click the Add button.
- Now navigate to Spark tab, in Environment variables section, add below environment variable:
Bash |
---|
| CUSTOM_BUILD_PKG=<Custom package URL provided by privacera>
|
- Click Confirm to save changes.
- Click on Start to restart the cluster.
- Once the cluster is started, Check
privacera.out
logs for validation of custom build deployment. Refer here for more details.