Skip to content

Setup for Access Management for Apache Spark

Configure

This section outlines the steps to set up the Apache Spark OLAC or OLAC_FGAC with the Privacera Plugin. Ensure that all prerequisites are completed before beginning the setup process.

Perform following steps to configure Apache Spark OLAC connector:

  1. SSH into the instance where Privacera Manager is installed.

  2. Run the following command to navigate to the /config directory and copy yml files:

    cd ~/privacera/privacera-manager/config
    cp sample-vars/vars.spark-standalone.yml custom-vars/
    

  3. Modify the following properties:

    • In the vars.spark-standalone.yml file, update the following properties with the appropriate values:

      #Spark Env name.
      SPARK_ENV_TYPE: "<PLEASE_CHANGE>"
      
      #Add the spark home path
      #Eg: "/opt/spark"
      SPARK_HOME: "<PLEASE_CHANGE>"
      
      #Spark user home directory
      #Eg: "/opt/privacera"
      SPARK_USER_HOME: "<PLEASE_CHANGE>"
      

      Variable Definition Example
      SPARK_ENV_TYPE Set the environment type. privacera_spark_olac
      SPARK_HOME Home path of your Spark installation. /opt/spark
      SPARK_USER_HOME User home directory of your Spark installation. /opt/privacera
  4. Once the properties are configured, update your Privacera Manager platform instance by following the commands

    Bash
    cd ~/privacera/privacera-manager
    ./privacera-manager.sh post-install
    
  5. Once the post-install process is complete, you will see spark-standalone folder in the ~/privacera/privacera-manager/output directory, with the following folder structure:

    output/
    ├── spark-standalone/
    │   ├── spark_custom_conf/ 
    │   │   ├── FGAC/ 
    │   │   │   ├── README.md
    │   │   │   ├── jwttoken.pub
    │   │   │   ├── privacera_spark.properties
    │   │   ├── OLAC/
    │   │   │   ├── privacera_spark.properties
    │   │   │   ├── README.md
    │   │   ├── OLAC_FGAC/
    │   │   │   ├── README.md
    │   │   │   ├── jwttoken.pub
    │   │   │   ├── privacera_spark.properties
    │   │   ├── auditserver-secrets-keystore.jks
    │   │   ├── global-truststore.p12
    │   │   ├── ranger-plugin-keystore.p12
    │   │   ├── ranger.jceks
    │   ├── privacera_setup.sh
    │   ├── spark_custom_conf.zip
    │   ├── standalone_spark_FGAC.sh
    │   ├── standalone_spark_OLAC.sh
    │   ├── standalone_spark_OLAC_FGAC.sh
    

Setup Scripts and Configuration Files

Creating Script Files

  1. Create privacera-oss-plugin folder and navigate to it

    mkdir ~/privacera-oss-plugin
    cd ~/privacera-oss-plugin
    

  2. Create a penv.sh file and copy the following content to it

    cd ~/privacera-oss-plugin
    vi penv.sh
    

    penv.sh
    penv.sh
    #!/bin/bash
    
    ## Set up environment variables for Privacera Spark plugin deployment
    
    
    # Privacera configuration
    export PRIVACERA_BASE_DOWNLOAD_URL="<PRIVACERA_BASE_DOWNLOAD_URL>"
    # Supported spark-plugin types are: OLAC, OLAC_FGAC
    export PRIVACERA_SPARK_PLUGIN_TYPE="OLAC"
    
    # Uncomment this to enable delta lake support
    #export OSS_DELTA_LAKE_ENABLE="true"
    
    
    # Spark configuration
    # Spark Docker image (e.g. spark:3.5.5-python3)
    export SPARK_BASE_IMAGE="<SPARK_BASE_IMAGE>"
    # Spark version (e.g., 3.5.5)
    export SPARK_VERSION="<SPARK_VERSION>"
    
    
    # Docker image configuration
    export SPARK_PLUGIN_IMAGE="<SPARK_PLUGIN_IMAGE>"
    export SPARK_DOCKER_PULL_SECRET="<SPARK_DOCKER_PULL_SECRET>"
    export TAG="<DOCKER_IMAGE_TAG>"
    
    
    ## K8S configuration
    export SPARK_NAME_SPACE="<SPARK_NAME_SPACE>"
    export PRIVACERA_SECRET_NAME="privacera-spark-secret"
    export PRIVACERA_CONFIGMAP_NAME="privacera-spark-configmap"
    
    export SPARK_PLUGIN_ROLE_BINDING="privacera-sa-spark-plugin-role-binding"
    export SPARK_PLUGIN_SERVICE_ACCOUNT="privacera-sa-spark-plugin"
    export SPARK_PLUGIN_ROLE="privacera-sa-spark-plugin-role"
    export SPARK_PLUGIN_APP_NAME="privacera-spark-examples"
    
    
    # Input and output directories for k8s templates
    export INPUT_DIR="./templates"
    export OUTPUT_DIR="./output"
    
    • To get the PRIVACERA_BASE_DOWNLOAD_URL run the following command where Privacera is installed. Update this download URL in penv.sh script file

      grep -i 'PRIVACERA_BASE_DOWNLOAD_URL=' ~/privacera/privacera-manager/output/spark-standalone/standalone_spark_OLAC.sh
      

      Note

      To enable OLAC_FGAC update the PRIVACERA_SPARK_PLUGIN_TYPE in the penv.sh script as shown below

      PRIVACERA_SPARK_PLUGIN_TYPE="OLAC_FGAC"
      

  3. Run the following command to create and edit build_privacera_plugin.sh file. Expand the following section and copy it's content to the build_privacera_plugin.sh file

    vi build_privacera_plugin.sh
    

    build_privacera_plugin.sh
    build_privacera_plugin.sh
    #!/bin/bash
    
    set -x
    
    SCRIPT_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )"
    ENV_FILE="${SCRIPT_DIR}/penv.sh"
    
    # if penv.sh exists, load it
    if [ -f ${ENV_FILE} ]; then
      echo "Sourcing env file ${ENV_FILE}"
      source ${ENV_FILE}
    else
      echo "${ENV_FILE} file not found"
      exit 1
    fi
    
    SPARK_HOME=${SCRIPT_DIR}/spark_home/opt/spark
    PRIVACERA_HOME=${SCRIPT_DIR}/privacera_home
    PRIVACERA_WORK_DIR=${PRIVACERA_HOME}/work
    
    PRIVACERA_SPARK_PLUGIN_TYPE=${PRIVACERA_SPARK_PLUGIN_TYPE}
    ENV_TYPE=PLATFORM
    SPARK_DEFAULT_CONF=${SPARK_HOME}/conf/spark-defaults.conf
    USER_NAME=spark
    GROUP_NAME=spark
    
    export PRIVACERA_HOME=${PRIVACERA_HOME}
    export PRIVACERA_WORK_DIR=${PRIVACERA_WORK_DIR}
    export PRIVACERA_DOWNLOAD=$PRIVACERA_HOME/downloads
    export SPARK_DEFAULT_CONF=${SPARK_DEFAULT_CONF}
    export PRIVACERA_SPARK_PLUGIN_TYPE=${PRIVACERA_SPARK_PLUGIN_TYPE}
    export PRIVACERA_BASE_DOWNLOAD_URL=${PRIVACERA_BASE_DOWNLOAD_URL}
    export ENV_TYPE=${ENV_TYPE}
    export OSS_DELTA_LAKE_ENABLE=${OSS_DELTA_LAKE_ENABLE}
    export SPARK_VERSION=${SPARK_VERSION}
    export PRIVACERA_SETUP_SCRIPT="privacera_setup.sh"
    export SPARK_USER=${USER_NAME}
    export SPARK_GROUP=${GROUP_NAME}
    export SPARK_HOME=${SPARK_HOME}
    
    rm -rf ${PRIVACERA_HOME}
    rm -rf ${SPARK_HOME}
    
    mkdir -p ${PRIVACERA_HOME}
    mkdir -p ${SPARK_HOME}
    mkdir -p ${SPARK_HOME}/jars
    mkdir -p ${SPARK_HOME}/conf
    
    touch ${SPARK_HOME}/conf/spark-defaults.conf
    
    # run the actual script that downloads and installs the plugin
    bash scripts/privacera_setup.sh
    
    cp ${PRIVACERA_HOME}/work/privacera_version.txt ${SPARK_HOME}/conf
    sed -i "s|${SPARK_HOME}|/opt/spark|g" ${SPARK_HOME}/conf/spark-defaults.conf
    
    tar cvfz privacera-spark-plugin.tar.gz -C ${SPARK_HOME}/../.. .
    
  4. Create scripts folder and navigate to it

    mkdir ~/privacera-oss-plugin/scripts
    cd ~/privacera-oss-plugin/scripts
    

  5. Create privacera_setup.sh script file and copy the following content to it

    vi privacera_setup.sh
    

    privacera_setup.sh
    privacera_setup.sh
    #!/bin/bash
    
    set -x
    
    echo "SPARK_DEFAULT_CONF ${SPARK_DEFAULT_CONF}"
    echo "SPARK_HOME ${SPARK_HOME}"
    echo "PRIVACERA_SPARK_PLUGIN_TYPE ${PRIVACERA_SPARK_PLUGIN_TYPE}"
    echo "PRIVACERA_BASE_DOWNLOAD_URL ${PRIVACERA_BASE_DOWNLOAD_URL}"
    echo "SPARK_VERSION ${SPARK_VERSION}"
    
    export ENV_TYPE=${ENV_TYPE:-"PLATFORM"}
    export OSS_DELTA_LAKE_ENABLE=${OSS_DELTA_LAKE_ENABLE:-false}
    export SPARK_VERSION=${SPARK_VERSION}
    
    # spark folders
    export SPARK_PLUGIN_TYPE=${PRIVACERA_SPARK_PLUGIN_TYPE}
    SCRIPT_NAME=standalone_spark_setup.sh
    
    SCRIPT_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )"
    
    chmod +x ${SCRIPT_DIR}/${SCRIPT_NAME}
    . ${SCRIPT_DIR}/${SCRIPT_NAME}
    
  6. Create standalone_spark_setup.sh script file and copy the following content to it

    vi standalone_spark_setup.sh
    

    standalone_spark_setup.sh
    standalone_spark_setup.sh
    #!/bin/bash
    
    set -x
    
    #no need to change
    SCRIPT_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )"
    PRIVACERA_WORK_DIR=${PRIVACERA_WORK_DIR:-${SCRIPT_DIR}/work}
    PRIVACERA_OUT_FILE=${PRIVACERA_OUT_FILE:-${PRIVACERA_WORK_DIR}/privacera.out}
    PKG_NAME=${PKG_NAME:-privacera-spark-plugin.tar.gz}
    PRIVACERA_CONF_FOLDER=${PRIVACERA_CONF_FOLDER:-${SPARK_HOME}/conf}
    PRIVACERA_CUSTOM_CONF_ZIP=${PRIVACERA_CUSTOM_CONF_ZIP:-spark_custom_conf.zip}
    SPARK_DEFAULT_CONF=${SPARK_HOME}/conf/spark-defaults.conf
    SPARK_PLUGIN_TYPE=${SPARK_PLUGIN_TYPE:-"OLAC"}
    OSS_DELTA_LAKE_ENABLE=${OSS_DELTA_LAKE_ENABLE:-false}
    ENV_TYPE=${ENV_TYPE:-"PLATFORM"}
    SPARK_VERSION=${SPARK_VERSION}
    
    if [[ $ENV_TYPE != "PCLOUD" ]]; then
      # for platform we have to append as we are using same path for both
      export PRIVACERA_BASE_DOWNLOAD_URL=${PRIVACERA_BASE_DOWNLOAD_URL}/spark-plugin
    fi
    
    rm -rf ${PRIVACERA_OUT_FILE}
    function log(){
      msg=$1
      currentTime=`date`
      echo "${currentTime} : ${msg} " >> ${PRIVACERA_OUT_FILE}
    }
    
    function createFolders() {
      #delete existing jars and work folder so update would not create issue as jar name is different in every release
      rm -rf ${SPARK_HOME}/jars/ranger-*
      rm -rf ${SPARK_HOME}/jars/privacera*
      rm -rf ${PRIVACERA_WORK_DIR}
      mkdir -p ${PRIVACERA_WORK_DIR}
    }
    
    function validation(){
      if [[ $SPARK_HOME == "" ]];then
        log " SPARK_HOME can't be empty "
        exit 1;
      fi
      log "SPARK_HOME = ${SPARK_HOME}"
    }
    
    download_pkg(){
      log "Downloading pkg from ${PRIVACERA_BASE_DOWNLOAD_URL}/${PKG_NAME}"
      cd ${PRIVACERA_WORK_DIR}
      if [[ ${PRIVACERA_BASE_DOWNLOAD_URL} == https* ]]; then
        log "Downloading pkg using wget ${PRIVACERA_BASE_DOWNLOAD_URL}/${PKG_NAME}"
        wget -nv ${PRIVACERA_BASE_DOWNLOAD_URL}/${PKG_NAME} -O ${PKG_NAME}
      else
        log "download pkg path is not yet supported ${PRIVACERA_BASE_DOWNLOAD_URL}/${PKG_NAME}"
      fi
    }
    
    function install() {
      log "Extracting ${PRIVACERA_WORK_DIR}/${PKG_NAME} into ${PRIVACERA_WORK_DIR}"
      tar xzf ${PRIVACERA_WORK_DIR}/${PKG_NAME} --directory ${PRIVACERA_WORK_DIR}
    
      #putting privacera jar into spark home
      cp -r ${PRIVACERA_WORK_DIR}/spark-plugin/* $SPARK_HOME/jars/
    
      if [ ! -f $SPARK_DEFAULT_CONF ]; then
       cp $SPARK_DEFAULT_CONF.template $SPARK_DEFAULT_CONF
      fi
    }
    
    function update_jars() {
      # find java major version
      JAVA_MAJOR_VERSION=$(java -version 2>&1 | sed -E -n 's/.* version "([^.-]*).*"/\1/p' | cut -d' ' -f1)
      echo "Java major version: ${JAVA_MAJOR_VERSION}"
      if [ ${JAVA_MAJOR_VERSION} -ge 15 ]; then
        cp -r ${PRIVACERA_WORK_DIR}/jdk15-jars/* $SPARK_HOME/jars/ranger-spark-plugin-impl/
      fi
    }
    
    function configure() {
        log "Configure started"
    
        #  NOTE: Equivalent of this will be done in the entrypoint script
        PRIVACERA_CONF_WORK_DIR="${PRIVACERA_WORK_DIR}/spark-conf"
    
        cp ${PRIVACERA_CONF_WORK_DIR}/resource_type_plugin_map.json ${PRIVACERA_CONF_FOLDER}/
    
        #setting spark agent
        if [ `grep -rin "spark.driver.extraJavaOptions" $SPARK_DEFAULT_CONF |  wc -l` -gt 0 ];then
          sed -i '' -e 's|spark.driver.extraJavaOptions .*|spark.driver.extraJavaOptions -javaagent:'${SPARK_HOME}'/jars/privacera-agent.jar -Dlog4j.configurationFile=file:///privacera-conf/log4j2.properties|g' $SPARK_DEFAULT_CONF
        else
          echo "spark.driver.extraJavaOptions -javaagent:${SPARK_HOME}/jars/privacera-agent.jar -Dlog4j.configurationFile=file:///privacera-conf/log4j2.properties" >> $SPARK_DEFAULT_CONF
        fi
    
        if [[ "${SPARK_PLUGIN_TYPE}" == *"FGAC"* ]]; then
          log "Spark python/sql plugin "
          if [ `grep -rin "spark.sql.extensions" $SPARK_DEFAULT_CONF |  wc -l` -gt 0 ];then
            sed -i '' -e 's|spark.sql.extensions.*|spark.sql.extensions com.privacera.spark.agent.SparkSQLExtension|g' $SPARK_DEFAULT_CONF
          else
            echo "#Setting Privacera spark-plugin properties" >> $SPARK_DEFAULT_CONF
            echo "spark.sql.extensions com.privacera.spark.agent.SparkSQLExtension" >> $SPARK_DEFAULT_CONF
          fi
        elif [[ "${SPARK_PLUGIN_TYPE}" == *"OLAC"* ]]; then
          echo "Setting agent for executor"
            #setting spark agent
            if [ `grep -rin "# spark.executor.extraJavaOptions" $SPARK_DEFAULT_CONF |  wc -l` -gt 0 ];then
              sed -i '' -e 's|# spark.executor.extraJavaOptions .*|spark.executor.extraJavaOptions -javaagent:'${SPARK_HOME}'/jars/privacera-agent.jar -Dlog4j.configurationFile=file:///privacera-conf/log4j2.properties|g' $SPARK_DEFAULT_CONF
            elif [ `grep -rin "spark.executor.extraJavaOptions" $SPARK_DEFAULT_CONF |  wc -l` -gt 0 ];then
              sed -i '' -e 's|spark.executor.extraJavaOptions .*|spark.executor.extraJavaOptions -javaagent:'${SPARK_HOME}'/jars/privacera-agent.jar|g -Dlog4j.configurationFile=file:///privacera-conf/log4j2.properties' $SPARK_DEFAULT_CONF
            else
              echo "spark.executor.extraJavaOptions -javaagent:${SPARK_HOME}/jars/privacera-agent.jar -Dlog4j.configurationFile=file:///privacera-conf/log4j2.properties " >> $SPARK_DEFAULT_CONF
            fi
        fi
    
        ## add or append 'spark.sql.hive.metastore.sharedPrefixes'
        if [ `grep -rin "^spark.sql.hive.metastore.sharedPrefixes" $SPARK_DEFAULT_CONF |  wc -l` -gt 0 ];then
          sharedPrefixes=`grep -ri "^spark.sql.hive.metastore.sharedPrefixes" $SPARK_DEFAULT_CONF`
          sharedPrefixes="${sharedPrefixes#spark.sql.hive.metastore.sharedPrefixes}"
          # trim leading whitespaces ###
          sharedPrefixes="${sharedPrefixes##*( )}"
          # trim trailing whitespaces  ##
          sharedPrefixes="${sharedPrefixes%%*( )}"
          sed -i -e "s|^spark.sql.hive.metastore.sharedPrefixes .*|#spark.sql.hive.metastore.sharedPrefixes ${sharedPrefixes}|g" $SPARK_DEFAULT_CONF
          echo "spark.sql.hive.metastore.sharedPrefixes ${sharedPrefixes},com.privacera,com.amazonaws" >> $SPARK_DEFAULT_CONF
        else
          echo "spark.sql.hive.metastore.sharedPrefixes com.privacera,com.amazonaws" >> $SPARK_DEFAULT_CONF
        fi
    
        log "Configure completed"
    }
    
    function setup_deltalake() {
        # configure deltalake support
        log "setup_deltalake started"
    
        if [ "${OSS_DELTA_LAKE_ENABLE}" == true ]; then
    
            if [ "${SPARK_VERSION}" = "3.4.0" ] || [ "${SPARK_VERSION}" = "3.4.1" ]; then
                wget https://repo1.maven.org/maven2/io/delta/delta-core_2.12/2.4.0/delta-core_2.12-2.4.0.jar -P "${SPARK_HOME}/jars/"
                wget https://repo1.maven.org/maven2/io/delta/delta-storage/2.4.0/delta-storage-2.4.0.jar -P "${SPARK_HOME}/jars/"
    
            elif [ "${SPARK_VERSION}" = "3.5.0" ]; then
                wget https://repo1.maven.org/maven2/io/delta/delta-spark_2.12/3.0.0/delta-spark_2.12-3.0.0.jar -P "${SPARK_HOME}/jars/"
                wget https://repo1.maven.org/maven2/io/delta/delta-storage/3.0.0/delta-storage-3.0.0.jar -P "${SPARK_HOME}/jars/"
    
            elif [ "${SPARK_VERSION}" = "3.5.1" ]; then
                wget https://repo1.maven.org/maven2/io/delta/delta-spark_2.12/3.1.0/delta-spark_2.12-3.1.0.jar -P "${SPARK_HOME}/jars/"
                wget https://repo1.maven.org/maven2/io/delta/delta-storage/3.1.0/delta-storage-3.1.0.jar -P "${SPARK_HOME}/jars/"
    
            elif [ "${SPARK_VERSION}" = "3.5.3" ]; then
                wget https://repo1.maven.org/maven2/io/delta/delta-spark_2.12/3.2.1/delta-spark_2.12-3.2.1.jar -P "${SPARK_HOME}/jars/"
                wget https://repo1.maven.org/maven2/io/delta/delta-storage/3.2.1/delta-storage-3.2.1.jar -P "${SPARK_HOME}/jars/"
    
            elif [ "${SPARK_VERSION}" = "3.5.4" ]; then
                wget https://repo1.maven.org/maven2/io/delta/delta-spark_2.12/3.3.0/delta-spark_2.12-3.3.0.jar -P "${SPARK_HOME}/jars/"
                wget https://repo1.maven.org/maven2/io/delta/delta-storage/3.3.0/delta-storage-3.3.0.jar -P "${SPARK_HOME}/jars/"
    
            elif [ "${SPARK_VERSION}" = "3.5.5" ]; then
                wget https://repo1.maven.org/maven2/io/delta/delta-spark_2.12/3.3.1/delta-spark_2.12-3.3.1.jar -P "${SPARK_HOME}/jars/"
                wget https://repo1.maven.org/maven2/io/delta/delta-storage/3.3.1/delta-storage-3.3.1.jar -P "${SPARK_HOME}/jars/"
    
            else
                cp -r "${PRIVACERA_WORK_DIR}/oss-delta-jars/"* "${SPARK_HOME}/jars/"
            fi
    
            # update spark-defaults.conf
            if [ $(grep -rin "spark.sql.catalog.spark_catalog" "$SPARK_DEFAULT_CONF" | wc -l) -gt 0 ]; then
                sed -i '' -e 's|spark.sql.catalog.spark_catalog.*|spark.sql.catalog.spark_catalog org.apache.spark.sql.delta.catalog.DeltaCatalog|g' "$SPARK_DEFAULT_CONF"
            else
                echo "spark.sql.catalog.spark_catalog org.apache.spark.sql.delta.catalog.DeltaCatalog" >> "$SPARK_DEFAULT_CONF"
            fi
    
            if [[ "${SPARK_PLUGIN_TYPE}" == *"FGAC"* ]]; then
                if [ $(grep -rin "spark.sql.extensions" "$SPARK_DEFAULT_CONF" | wc -l) -gt 0 ]; then
                    sed -i "s|spark.sql.extensions.*|spark.sql.extensions com.privacera.spark.agent.SparkSQLExtension,io.delta.sql.DeltaSparkSessionExtension|g" "$SPARK_DEFAULT_CONF"
                else
                    echo "spark.sql.extensions com.privacera.spark.agent.SparkSQLExtension,io.delta.sql.DeltaSparkSessionExtension" >> "$SPARK_DEFAULT_CONF"
                fi
            else
                if [ $(grep -rin "spark.sql.extensions" "$SPARK_DEFAULT_CONF" | wc -l) -gt 0 ]; then
                    sed -i "s|spark.sql.extensions.*|spark.sql.extensions io.delta.sql.DeltaSparkSessionExtension|g" "$SPARK_DEFAULT_CONF"
                else
                    echo "spark.sql.extensions io.delta.sql.DeltaSparkSessionExtension" >> "$SPARK_DEFAULT_CONF"
                fi
            fi
    
        fi  
    }
    
    function  verify() {
        log "Privacera Spark setup completed"
    }
    
    createFolders
    validation
    download_pkg
    install
    update_jars
    configure
    setup_deltalake
    verify
    
  7. Make the privacera_setup.sh and standalone_spark_setup.sh script files executable

    chmod +x privacera_setup.sh standalone_spark_setup.sh
    

Create Configuration Files

  • Create config folder and navigate to it
    mkdir ~/privacera-oss-plugin/config
    cd ~/privacera-oss-plugin/config
    
  • Copy the privacera_spark.properties and global-truststore.p12 files to the config folder

    cp ~/privacera/privacera-manager/output/spark-standalone/spark_custom_conf/OLAC/privacera_spark.properties ~/privacera-oss-plugin/config/
    cp ~/privacera/privacera-manager/output/spark-standalone/spark_custom_conf/global-truststore.p12 ~/privacera-oss-plugin/config/
    

  • Navigate to config folder and update the privacera_spark.properties file as follow:

    cd ~/privacera-oss-plugin/config/
    vi privacera_spark.properties
    
    privacera.signer.truststore=/opt/privacera/global-truststore.p12
    

  • Copy the privacera_spark.properties and required configuration files to the config folder

    cp ~/privacera/privacera-manager/output/spark-standalone/spark_custom_conf/OLAC_FGAC/privacera_spark.properties ~/privacera-oss-plugin/config/
    cp ~/privacera/privacera-manager/output/spark-standalone/spark_custom_conf/auditserver-secrets-keystore.jks ~/privacera-oss-plugin/config/
    cp ~/privacera/privacera-manager/output/spark-standalone/spark_custom_conf/ranger-plugin-keystore.p12 ~/privacera-oss-plugin/config/
    cp ~/privacera/privacera-manager/output/spark-standalone/spark_custom_conf/ranger.jceks ~/privacera-oss-plugin/config/
    cp ~/privacera/privacera-manager/output/spark-standalone/spark_custom_conf/global-truststore.p12 ~/privacera-oss-plugin/config/
    

  • Navigate to config folder and update the privacera_spark.properties file as follow:

    cd ~/privacera-oss-plugin/config/
    vi privacera_spark.properties
    
    privacera.signer.truststore=/opt/privacera/global-truststore.p12
    xasecure.audit.keystore.path=/opt/privacera/auditserver-secrets-keystore.jks
    xasecure.policymgr.clientssl.keystore=/opt/privacera/ranger-plugin-keystore.p12
    xasecure.policymgr.clientssl.keystore.credential.file=jceks://file//opt/privacera/ranger.jceks
    xasecure.policymgr.clientssl.truststore=/opt/privacera/global-truststore.p12
    xasecure.policymgr.clientssl.truststore.credential.file=jceks://file//opt/privacera/ranger.jceks
    

  • Remove the following property from the privacera_spark.properties file:

    vi privacera_spark.properties
    privacera.clusterName=<CLUSTER_NAME>
    

  • Create log4j2.properties file and copy the following content to it

    vi log4j2.properties
    

    log4j2.properties
    log4j2-properties
    # Root logger configuration
    rootLogger.level = info
    rootLogger.appenderRefs = stdout, file
    rootLogger.appenderRef.stdout.ref = console
    rootLogger.appenderRef.file.ref = RollingFileAppender
    
    # console appender
    appender.console.type = Console
    appender.console.name = console
    appender.console.target = SYSTEM_ERR
    appender.console.layout.type = PatternLayout
    appender.console.layout.pattern = %d{yy/MM/dd HH:mm:ss} [%t] %p %c :%L %m%n%ex
    
    ## rolling file config
    appender.rolling.type = RollingFile
    appender.rolling.name = RollingFileAppender
    appender.rolling.fileName = /tmp/${sys:user.name}/privacera.log
    appender.rolling.filePattern = /tmp/${sys:user.name}/privacera-%d{yyyy-MM-dd}-%i.log
    appender.rolling.layout.type = PatternLayout
    appender.rolling.layout.pattern = %d{yyyy-MM-dd HH:mm:ss} [%t] %p %c :%L %m%n%ex
    appender.rolling.policies.type = Policies
    appender.rolling.policies.time.type = TimeBasedTriggeringPolicy
    appender.rolling.policies.time.interval = 1
    appender.rolling.policies.size.type = SizeBasedTriggeringPolicy
    appender.rolling.policies.size.size = 100MB
    
    # privacera
    logger.privacera.name = com.privacera
    logger.privacera.level = info
    logger.privacera.additivity = false
    logger.privacera.appenderRefs = rolling
    logger.privacera.appenderRef.rolling.ref = RollingFileAppender
    
    # ranger
    logger.ranger.name = org.apache.ranger
    logger.ranger.level = info
    logger.ranger.additivity = false
    logger.ranger.appenderRefs = rolling
    logger.ranger.appenderRef.rolling.ref = RollingFileAppender
    
    # aws sdk v1
    logger.amazon.name = com.amazon
    logger.amazon.level = info
    logger.amazon.additivity = false
    logger.amazon.appenderRefs = rolling
    logger.amazon.appenderRef.rolling.ref = RollingFileAppender
    
    logger.amazonaws.name = com.amazonaws
    logger.amazonaws.level = info
    logger.amazonaws.additivity = false
    logger.amazonaws.appenderRefs = rolling
    logger.amazonaws.appenderRef.rolling.ref = RollingFileAppender
    
    # aws sdk v2
    logger.software-amazon.name = software.amazon.awssdk
    logger.software-amazon.level = info
    logger.software-amazon.additivity = false
    logger.software-amazon.appenderRefs = rolling
    logger.software-amazon.appenderRef.rolling.ref = RollingFileAppender
    
    # apache http
    logger.apache.name = org.apache.http.wire
    logger.apache.level = info
    logger.apache.additivity = false
    logger.apache.appenderRefs = rolling
    logger.apache.appenderRef.rolling.ref = RollingFileAppender
    
    # apache http client
    logger.httpclient.name = org.apache.http.client
    logger.httpclient.level = ERROR
    logger.httpclient.additivity = false
    logger.httpclient.appenderRefs = console
    logger.httpclient.appenderRef.console.ref = ConsoleAppender
    

Generate Privacera Deployment File

Note

  • Execute the build_privacera_plugin.sh script to generate the privacera deployment tarball file:
    cd ~/privacera-oss-plugin
    chmod +x build_privacera_plugin.sh
    ./build_privacera_plugin.sh
    

Building the Docker Image

Create Dockerfile

  • Create Dockerfile script file and copy the following content to it

    Note

    • The Docker file Dockerfile is provided here is as an example file that uses the open-source Apache Spark images. Instead, you can use your custom spark docker file and integrate the below steps to deploy Privacera Plugin into it.
    • To achieve that, add the below commands in your existing Dockerfile
      ARG PRIVACERA_SPARK_PLUGIN_TAR_GZ=./privacera-spark-plugin.tar.gz
      
      # Add the privacera-spark-plugin tar.gz to the image at the root
      ADD ${PRIVACERA_SPARK_PLUGIN_TAR_GZ} 
      
      RUN mkdir -p ${PRIVACERA_HOME}
      RUN chown -R ${USER_NAME}:${GROUP_NAME} ${PRIVACERA_HOME}
      
      # Symlinks for common files
      RUN ln -sf /privacera-secret/privacera_spark.properties /opt/spark/conf/privacera_spark.properties && \
          ln -sf /privacera-secret/global-truststore.p12 /opt/privacera/global-truststore.p12 && \
          ln -sf /privacera-conf/log4j2.properties /opt/spark/conf/log4j2.properties
      
      # Symlinks for OLAC_FGAC
      RUN if [ "$PRIVACERA_SPARK_PLUGIN_TYPE" = "OLAC_FGAC" ]; then \
          ln -sf /privacera-secret/auditserver-secrets-keystore.jks /opt/privacera/auditserver-secrets-keystore.jks && \
          ln -sf /privacera-secret/ranger-plugin-keystore.p12 /opt/privacera/ranger-plugin-keystore.p12 && \
          ln -sf /privacera-secret/ranger.jceks /opt/privacera/ranger.jceks ; \
          fi
      
    cd ~/privacera-oss-plugin
    vi Dockerfile
    
    Dockerfile
    Dockerfile
    ARG SPARK_BASE_IMAGE
    FROM ${SPARK_BASE_IMAGE}
    
    ARG SPARK_VERSION
    ARG PRIVACERA_SPARK_PLUGIN_TAR_GZ=./privacera-spark-plugin.tar.gz
    ARG PRIVACERA_SPARK_PLUGIN_TYPE
    ARG SPARK_HOME=/opt/spark
    ARG USER_NAME=spark
    ARG GROUP_NAME=spark
    ARG PRIVACERA_HOME=/opt/privacera
    
    ENV PRIVACERA_HOME=${PRIVACERA_HOME}
    ENV PRIVACERA_SPARK_PLUGIN_TYPE=${PRIVACERA_SPARK_PLUGIN_TYPE}
    ENV OSS_DELTA_LAKE_ENABLE=${OSS_DELTA_LAKE_ENABLE}
    ENV SPARK_VERSION=${SPARK_VERSION}
    ENV SPARK_USER=${USER_NAME}
    ENV SPARK_GROUP=${GROUP_NAME}
    ENV SPARK_HOME=${SPARK_HOME}
    
    USER root
    
    # Download AWS JARs for supported Spark versions
    RUN if [ "${SPARK_VERSION}" = "3.5.3" ] || \
           [ "${SPARK_VERSION}" = "3.5.4" ] || \
           [ "${SPARK_VERSION}" = "3.5.5" ]; then \
            wget https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-aws/3.3.4/hadoop-aws-3.3.4.jar \
                -P ${SPARK_HOME}/jars/; \
            wget https://repo1.maven.org/maven2/com/amazonaws/aws-java-sdk-bundle/1.12.781/aws-java-sdk-bundle-1.12.781.jar \
                -P ${SPARK_HOME}/jars/; \
        else \
            echo "No hadoop-aws and aws-java-sdk-bundle jars downloaded"; \
        fi
    
    # Add the privacera-spark-plugin tar.gz to the image at the root
    ADD ${PRIVACERA_SPARK_PLUGIN_TAR_GZ} /
    
    # Set up Privacera directories and permissions
    RUN mkdir -p ${PRIVACERA_HOME}
    RUN chown -R ${USER_NAME}:${GROUP_NAME} ${PRIVACERA_HOME}
    
    # Symlinks for common files
    RUN ln -sf /privacera-secret/privacera_spark.properties /opt/spark/conf/privacera_spark.properties && \
        ln -sf /privacera-secret/global-truststore.p12 /opt/privacera/global-truststore.p12 && \
        ln -sf /privacera-conf/log4j2.properties /opt/spark/conf/log4j2.properties
    
    # Symlinks for OLAC_FGAC
    RUN if [ "$PRIVACERA_SPARK_PLUGIN_TYPE" = "OLAC_FGAC" ]; then \
          ln -sf /privacera-secret/auditserver-secrets-keystore.jks /opt/privacera/auditserver-secrets-keystore.jks && \
          ln -sf /privacera-secret/ranger-plugin-keystore.p12 /opt/privacera/ranger-plugin-keystore.p12 && \
          ln -sf /privacera-secret/ranger.jceks /opt/privacera/ranger.jceks ; \
        fi
    
    
    USER ${USER_NAME}
    

Build the Docker Image

  1. Run the following command to create and edit build_spark_image.sh file. Expand the following section and copy it's content to the build_spark_image.sh file

    cd ~/privacera-oss-plugin
    vi build_spark_image.sh
    

    build_spark_image.sh
    build_spark_image.sh
    #!/bin/bash
    
    set -x
    
    SCRIPT_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )"
    ENV_FILE="${SCRIPT_DIR}/penv.sh"
    
    # if penv.sh exists, load it
    if [ -f ${ENV_FILE} ]; then
      echo "Sourcing env file ${ENV_FILE}"
      source ${ENV_FILE}
    else
      echo "${ENV_FILE} file not found"
      exit 1
    fi
    
    # Fallback defaults if not set in penv.sh
    SPARK_BASE_IMAGE="${SPARK_BASE_IMAGE:-spark:3.5.3-python3}"
    SPARK_VERSION="${SPARK_VERSION:-3.5.3}"
    
    # create plugin image
    DOCKER_BUILDKIT=1 docker build \
        -t ${TAG} \
        --build-arg SPARK_BASE_IMAGE=${SPARK_BASE_IMAGE} \
        --build-arg SPARK_VERSION=${SPARK_VERSION} \
        --build-arg PRIVACERA_SPARK_PLUGIN_TYPE=${PRIVACERA_SPARK_PLUGIN_TYPE} \
        --progress plain \
        -f Dockerfile .
    
  2. Execute the build_spark_image.sh script to generate the privacera deployment files:

    chmod +x build_spark_image.sh
    ./build_spark_image.sh
    

Push the Docker Image to the Remote HUB

  • Use your internal HUB to publish the image.

Kubernetes Deployment

Create Kubernetes YAML Template Files

  1. Create k8s/templates folder and navigate to it

    mkdir -p ~/privacera-oss-plugin/k8s/templates
    cd ~/privacera-oss-plugin/k8s/templates
    

  2. Create namespace.yml script file and copy the following content to it

    vi namespace.yml
    

    namespace.yml
    namespace.yaml
    1
    2
    3
    4
    apiVersion: v1
    kind: Namespace
    metadata:
      name: SPARK_NAME_SPACE
    
  3. Create role-binding.yml script file and copy the following content to it

    vi role-binding.yml
    

    role-binding.yml
    role-binding.yaml
    apiVersion: rbac.authorization.k8s.io/v1
    kind: RoleBinding
    metadata:
      name: SPARK_PLUGIN_ROLE_BINDING
      namespace: SPARK_NAME_SPACE
    subjects:
    - kind: ServiceAccount
      name: SPARK_PLUGIN_SERVICE_ACCOUNT
      namespace: SPARK_NAME_SPACE
    roleRef:
      apiGroup: rbac.authorization.k8s.io
      kind: Role
      name: SPARK_PLUGIN_ROLE
    
  4. Create role.yml script file and copy the following content to it

    vi role.yml
    

    role.yml
    role.yaml
    apiVersion: rbac.authorization.k8s.io/v1
    kind: Role
    metadata:
      name: SPARK_PLUGIN_ROLE
      namespace: SPARK_NAME_SPACE
    rules:
    - apiGroups: ["", "extensions", "apps"]
      resources: ["*"]
      verbs: ["*"]
    - apiGroups: ["batch"]
      resources:
      - jobs
      - cronjobs
      verbs: ["*"]
    
  5. Create service-account.yml script file and copy the following content to it

    vi service-account.yml
    

    service-account.yml
    service-account.yaml
    1
    2
    3
    4
    5
    apiVersion: v1
    kind: ServiceAccount
    metadata:
      name: SPARK_PLUGIN_SERVICE_ACCOUNT
      namespace:  SPARK_NAME_SPACE
    
  6. Create privacera-spark-examples.yml script file and copy the following content to it

    vi privacera-spark-examples.yml
    

    privacera-spark-examples.yml
    privacera-spark-examples.yaml
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      labels:
        app: SPARK_PLUGIN_APP_NAME
      name: SPARK_PLUGIN_APP_NAME
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: SPARK_PLUGIN_APP_NAME
      strategy:
        type: Recreate
      template:
        metadata:
          labels:
            app: SPARK_PLUGIN_APP_NAME
        spec:
          serviceAccountName: SPARK_PLUGIN_SERVICE_ACCOUNT
          securityContext:
            fsGroup: 200
          initContainers:
          imagePullSecrets:
            - name: SPARK_DOCKER_PULL_SECRET
          containers:
            - image: SPARK_PLUGIN_IMAGE
              name: SPARK_PLUGIN_APP_NAME-exec
              imagePullPolicy: Always
              command: [ "/bin/bash", "-ce", "tail -f /dev/null" ]
              env:
                - name: SPARK_PLUGIN_POD_NAME
                  valueFrom:
                    fieldRef:
                      fieldPath: metadata.name
                - name: SPARK_PLUGIN_POD_IP
                  valueFrom:
                    fieldRef:
                      fieldPath: status.podIP
              ports:
                - name: spark-ui
                  containerPort: 4040
                - name: sparkdriver
                  containerPort: 7077
                - name: blockmanager
                  containerPort: 7078
              resources:
                limits:
                  memory: "2Gi"
                  cpu: "0.5"
                requests:
                  memory: "1Gi"
                  cpu: "0.2"
              volumeMounts:
                - name: privacera-spark-secret-volume
                  mountPath: /privacera-secret
                - name: privacera-spark-conf-volume
                  mountPath: /privacera-conf
          restartPolicy: Always
          volumes:
            - name: privacera-spark-secret-volume
              secret:
                secretName: PRIVACERA_SECRET_NAME
            - name: privacera-spark-conf-volume
              configMap:
                name: PRIVACERA_CONFIGMAP_NAME
    status: {}
    

Create Scripts to Generate Kubernetes Deployment Files from Templates

  1. Create replace.sh script file and copy the following content to it

    cd ~/privacera-oss-plugin/k8s  
    vi replace.sh
    

    replace.sh
    replace.sh
    #!/bin/bash
    
    set -x
    
    SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
    
    ENV_FILE="${SCRIPT_DIR}/../penv.sh"
    
    # if penv.sh exists, load it
    if [ -f ${ENV_FILE} ]; then
      echo "Sourcing env file ${ENV_FILE}"
      source ${ENV_FILE}
    else
      echo "${ENV_FILE} file not found"
      exit 1
    fi
    
    SPARK_NAME_SPACE=${SPARK_NAME_SPACE:-privacera-spark-plugin-test}
    SPARK_PLUGIN_ROLE_BINDING=${SPARK_PLUGIN_ROLE_BINDING:-privacera-sa-spark-plugin-role-binding}
    SPARK_PLUGIN_SERVICE_ACCOUNT=${SPARK_PLUGIN_SERVICE_ACCOUNT:-privacera-sa-spark-plugin}
    SPARK_PLUGIN_ROLE=${SPARK_PLUGIN_ROLE:-privacera-sa-spark-plugin-role}
    SPARK_PLUGIN_APP_NAME=${SPARK_PLUGIN_APP_NAME:-privacera-spark-examples}
    SPARK_PLUGIN_IMAGE=${SPARK_PLUGIN_IMAGE:-hub.docker.us/spark-plugin:latest}
    SPARK_DOCKER_PULL_SECRET=${SPARK_DOCKER_PULL_SECRET:-docker-hub}
    INPUT_DIR=${INPUT_DIR:-./templates}
    OUTPUT_DIR=${OUTPUT_DIR:-./output}
    
    mkdir -p ${OUTPUT_DIR}
    
    files=(namespace.yml role-binding.yml service-account.yml privacera-spark-examples.yml role.yml)
    for i in ${!files[@]}
    do
        in_file="${INPUT_DIR}/${files[i]}"
        out_file="${OUTPUT_DIR}/${files[i]}"
        if [ -f "${in_file}" ]; then
          echo "replace variable in file ${in_file} > out_file=${out_file}"
          cp "${in_file}" "${out_file}"
          sed -i "s|SPARK_NAME_SPACE|${SPARK_NAME_SPACE}|g" "${out_file}"
          sed -i "s|SPARK_PLUGIN_ROLE_BINDING|${SPARK_PLUGIN_ROLE_BINDING}|g" "${out_file}"
          sed -i "s|SPARK_PLUGIN_SERVICE_ACCOUNT|${SPARK_PLUGIN_SERVICE_ACCOUNT}|g" "${out_file}"
          sed -i "s|SPARK_PLUGIN_ROLE|$SPARK_PLUGIN_ROLE|g" "${out_file}"
          sed -i "s|SPARK_PLUGIN_APP_NAME|$SPARK_PLUGIN_APP_NAME|g" "${out_file}"
          sed -i "s|SPARK_PLUGIN_IMAGE|$SPARK_PLUGIN_IMAGE|g" "${out_file}"
          sed -i "s|SPARK_DOCKER_PULL_SECRET|$SPARK_DOCKER_PULL_SECRET|g" "${out_file}"
          sed -i "s|PRIVACERA_SECRET_NAME|${PRIVACERA_SECRET_NAME}|g" "${out_file}"
          sed -i "s|PRIVACERA_CONFIGMAP_NAME|${PRIVACERA_CONFIGMAP_NAME}|g" "${out_file}"
        fi
    done
    
  2. Create apply.sh script file and copy the following content to it

    vi apply.sh
    

    apply.sh
    apply.sh
    #!/bin/bash
    
    set -x
    
    SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
    
    ENV_FILE="${SCRIPT_DIR}/../penv.sh"
    
    # if penv.sh exists, load it
    if [ -f ${ENV_FILE} ]; then
      echo "Sourcing env file ${ENV_FILE}"
      source ${ENV_FILE}
    else
      echo "${ENV_FILE} file not found"
      exit 1
    fi
    
    kubectl apply -f ${OUTPUT_DIR}/namespace.yml
    kubectl apply -f ${OUTPUT_DIR}/service-account.yml
    kubectl apply -f ${OUTPUT_DIR}/role.yml
    kubectl apply -f ${OUTPUT_DIR}/role-binding.yml
    
    # Delete and recreate Kubernetes secret
    kubectl delete secret ${PRIVACERA_SECRET_NAME} -n ${SPARK_NAME_SPACE}
    
    if [[ "${PRIVACERA_SPARK_PLUGIN_TYPE}" == "OLAC" ]]; then
      kubectl create secret generic "${PRIVACERA_SECRET_NAME}" \
        --from-file="${SCRIPT_DIR}/../config/privacera_spark.properties" \
        --from-file="${SCRIPT_DIR}/../config/global-truststore.p12" \
        -n "${SPARK_NAME_SPACE}"
    
    elif [[ "${PRIVACERA_SPARK_PLUGIN_TYPE}" == "OLAC_FGAC" ]]; then
      kubectl create secret generic "${PRIVACERA_SECRET_NAME}" \
        --from-file="${SCRIPT_DIR}/../config/privacera_spark.properties" \
        --from-file="${SCRIPT_DIR}/../config/global-truststore.p12" \
        --from-file="${SCRIPT_DIR}/../config/auditserver-secrets-keystore.jks" \
        --from-file="${SCRIPT_DIR}/../config/ranger-plugin-keystore.p12" \
        --from-file="${SCRIPT_DIR}/../config/ranger.jceks" \
        -n "${SPARK_NAME_SPACE}"
    fi
    
    # Delete and recreate Kubernetes configmap
    kubectl delete configmap ${PRIVACERA_CONFIGMAP_NAME} -n ${SPARK_NAME_SPACE}
    kubectl create configmap ${PRIVACERA_CONFIGMAP_NAME} \
      --from-file=${SCRIPT_DIR}/../config/log4j2.properties \
      -n ${SPARK_NAME_SPACE}
    
    kubectl apply -f ${OUTPUT_DIR}/privacera-spark-examples.yml -n ${SPARK_NAME_SPACE}
    

Apply the Kubernetes Deployment Files

  1. Make the apply.sh and replace.sh script files executable

    chmod +x apply.sh replace.sh
    

  2. Execute the replace.sh script to replace the variables in the k8s/templates folder

    ./replace.sh
    

  3. Execute the apply.sh script to create necessary Kubernetes secrets and apply Spark deployment configs

    ./apply.sh
    

Validate the Deployment

  1. Check the status of the pods

    export SPARK_NAME_SPACE=<SPARK_NAME_SPACE>
    kubectl get pods -n ${SPARK_NAME_SPACE}
    

  2. Access the pod

    kubectl exec -it <SPARK_PLUGIN_POD_NAME> -n ${SPARK_NAME_SPACE} -- /bin/bash
    

Comments