Steps to deploy custom build in Databricks cluster

Here are the steps to deploy custom build in databricks cluster:

  1. Prerequisites:
    • A running Databricks cluster.
  2. Log in to Databricks Web UI
  3. Click on the Workspace icon on the sidebar
  4. Navigate to folder where you want to create the custom build script.
  5. Click on Create icon on top right side, then click on File, set file name as custom_build.sh and add below content to the file:

    Script to deploy custom build in Databricks cluster
    custom_build.sh
    #!/bin/bash
    
    PRIVACERA_OUT_FILE=/root/privacera/privacera.out
    PRIVACERA_CLUSTER_LOGS_DIR=${PRIVACERA_CLUSTER_LOGS_DIR:-/dbfs/privacera/cluster-logs/${DB_CLUSTER_NAME}}
    
    LOG_LEVEL=INFO
    function log(){
      msg=$1
      currentTime=`date`
      echo "${currentTime} : ${msg} " >> ${PRIVACERA_OUT_FILE}
    }
    
    log "======================custom_build.sh execution started!!!======================"
    if [[ -z "${CUSTOM_BUILD_PKG}" ]]; then
      log "Error: CUSTOM_BUILD_PKG is not set or is empty. Please provide a valid URL in Environments Variables."
      exit 1
    fi
    
    log "Downloading custom Privacera Spark Plugin from ${CUSTOM_BUILD_PKG}..."
    
    log "Creating temporary directory for the custom build..."
    mkdir -p /tmp/custom
    cd /tmp/custom
    
    wget ${CUSTOM_BUILD_PKG} -O privacera-spark-plugin.tar.gz
    if [[ $? -ne 0 ]]; then
      log "Error: Failed to download the package from ${CUSTOM_BUILD_PKG}. Please check the URL."
      exit 1
    fi
    
    tar -xzf privacera-spark-plugin.tar.gz
    if [[ $? -ne 0 ]]; then
      log "Error: Failed to extract privacera-spark-plugin.tar.gz."
      exit 1
    fi
    
    OLD_MD5_SUM=$(md5sum /databricks/jars/privacera-agent.jar | awk '{print $1}')
    log "md5 checksum of the existing privacera-agent.jar: ${OLD_MD5_SUM}"
    log "Removing existing Privacera and Ranger jars..."
    rm -r /databricks/jars/ranger-* /databricks/jars/privacera-agent.jar
    
    log "Copying new jars to /databricks/jars/..."
    cp -r spark-plugin/* /databricks/jars/
    
    MD5_SUM=$(md5sum /databricks/jars/privacera-agent.jar | awk '{print $1}')
    log "md5 checksum of the privacera-agent.jar: ${MD5_SUM}" 
    log "Deployed custom build successfully."
    log "======================custom_build.sh execution completed!!!======================"
    
    log "Copying privacera.out to ${PRIVACERA_CLUSTER_LOGS_DIR}"
    cp ${PRIVACERA_OUT_FILE} ${PRIVACERA_CLUSTER_LOGS_DIR}/privacera.out
    
  6. Save custom_build.sh file.

  7. Now click on the Compute icon on the sidebar, then click on the cluster where logs need to be enabled and Edit this cluster.
  8. Scroll down to Advanced option and click on Init Scripts.
    • Select Workspace as the source for init scripts.
    • Specify the custom_build.sh script path that created in step 5, after plugin scripts (ranger_enable.sh for FGAC and ranger_enable_scala.sh for OLAC), which must be added as first script.
  9. Click the Add button.
  10. Now navigate to Spark tab, in Environment variables section, add below environment variable:
    Bash
    CUSTOM_BUILD_PKG=<Custom package URL provided by privacera>
    
  11. Click Confirm to save changes.
  12. Click on Start to restart the cluster.
  13. Once the cluster is started, Check privacera.out logs for validation of custom build deployment. Refer here for more details.

Comments